Fusion proteins and methods thereof

ABSTRACT

The invention discloses oncogenic fusion proteins. The invention provides methods for treating gene-fusion based cancers.

This application is a divisional of U.S. patent application Ser. No.16/246,167 filed on Jan. 11, 2019, which is a divisional of U.S. patentapplication Ser. No. 14/853,568 filed on Sep. 14, 2015, later issued asU.S. Pat. No. 10,208,296 on Feb. 19, 2019, which is a continuation ofPCT International Application No. PCT/US2014/026351 filed on Mar. 13,2014, which claims the benefit of and priority to U.S. ProvisionalPatent Application No. 61/793,086, filed on Mar. 15, 2013, the contentof each of which are hereby incorporated by reference in theirentireties.

All patents, patent applications and publications cited herein arehereby incorporated by reference in their entirety. The disclosures ofthese publications in their entireties are hereby incorporated byreference into this application.

GOVERNMENT SUPPORT

This invention was made with government support under grant CA101644awarded by the National Institutes of Health. The Government has certainrights in the invention.

This patent disclosure contains material that is subject to copyrightprotection. The copyright owner has no objection to the facsimilereproduction by anyone of the patent document or the patent disclosureas it appears in the U.S. Patent and Trademark Office patent file orrecords, but otherwise reserves any and all copyright rights.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in XML file format and is hereby incorporatedby reference in its entirety. Said XML copy, created on May 23, 2023, isnamed 0019240_01034US4_SL.xml and is 2,301,248 bytes in size.

BACKGROUND OF THE INVENTION

Glioblastoma multiforme (GBM) is the most common form of brain cancerand among the most incurable and lethal of all human cancers. Thecurrent standard of care includes surgery, chemotherapy, and radiationtherapy. However, the prognosis of GBM remains uniformly poor. There arefew available targeted therapies and none that specifically target GBM.

The target population of GBM patients who may carry EGFR gene fusionsand would benefit from targeted inhibition of EGFR kinase activity isestimated to correspond to 6,000 patients per year world-wide.

SUMMARY OF THE INVENTION

The invention is based, at least in part, on the discovery of a highlyexpressed class of gene fusions in GBM, which join the receptor tyrosinekinase (RTK) domain of EGFR genes to the coiled-coil domain of septinproteins, such as Septin-14, or fused to a polypeptide comprising aphosphoserine phosphatase (PSPH) protein or a polypeptide comprising aCullin-associated and neddylation-dissociated (CAND) protein. Theinvention is based, at least in part, on the finding that EGFR-SEPTfusions, EGFR-PSPH fusions, and EGFR-CAND fusions identify a subset ofGBM patients who will benefit from targeted inhibition of the tyrosinekinase activity of EGFR. Identification of fusions of EGFR genes inglioblastoma patients are useful therapeutic targets.

An aspect of the invention is directed to a purified fusion proteincomprising a tyrosine kinase domain of an EGFR protein fused to apolypeptide that constitutively activates the tyrosine kinase domain ofthe EGFR protein. In one embodiment, the purified fusion protein isessentially free of other human proteins.

An aspect of the invention is directed to a purified fusion proteincomprising the tyrosine kinase domain of an EGFR protein fused 5′ to apolypeptide comprising the coiled-coil domain of a Septin protein. Inone embodiment, the Septin protein is Septin-1, Septin-2, Septin-3,Septin-4, Septin-5, Septin-6, Septin-7, Septin-8, Septin-9, Septin-10,Septin-11, Septin-12, Septin-13, or Septin-14. In another embodiment,the Septin protein is Septin-14 (SEPT14). In another embodiment, thepurified fusion protein is essentially free of other human proteins.

An aspect of the invention is directed to a purified fusion proteincomprising the tyrosine kinase domain of an EGFR protein fused 5′ to apolypeptide comprising a phosphoserine phosphatase (PSPH) protein. Inanother embodiment, the purified fusion protein is essentially free ofother human proteins.

An aspect of the invention is directed to a purified fusion proteincomprising the tyrosine kinase domain of an EGFR protein fused 3′ to apolypeptide comprising a Cullin-associated and neddylation-dissociated(CAND) protein. In one embodiment, the CAND protein is CAND1, CAND2, orCAND3. In another embodiment, the purified fusion protein is essentiallyfree of other human proteins.

An aspect of the invention is directed to a purified fusion proteinencoded by an EGFR-SEPT14 nucleic acid, wherein EGFR-SEPT14 comprises acombination of exons 1-25 of EGFR located on human chromosome 7p11.2spliced 5′ to a combination of exons 7-10 of SEPT14 located on humanchromosome 7, wherein a genomic breakpoint occurs in any one of exons1-25 of EGFR and any one of exons 7-10 of SEPT14. In another embodiment,the purified fusion protein is essentially free of other human proteins.

An aspect of the invention is directed to a purified fusion proteinencoded by an EGFR-PSPH nucleic acid, wherein EGFR-PSPH comprises acombination of exons 1-25 of EGFR located on human chromosome 7p12spliced 5′ to a combination of exons 1-10 of PSPH located on humanchromosome 7p11.2, wherein a genomic breakpoint occurs in any one ofexons 1-25 of EGFR and any one of exons 1-10 of PSPH. In anotherembodiment, the purified fusion protein is essentially free of otherhuman proteins.

An aspect of the invention is directed to a purified fusion proteinencoded by an EGFR-CAND1 nucleic acid, wherein EGFR-CAND1 comprises acombination of exons 1-25 of EGFR located on human chromosome 7p12spliced 3′ to a combination of exons 1-16 of CAND1 located on humanchromosome 12q14, wherein a genomic breakpoint occurs in any one ofexons 1-25 of EGFR and any one of exons 1-16 of CAND1. In anotherembodiment, the purified fusion protein is essentially free of otherhuman proteins.

An aspect of the invention is directed to a synthetic nucleic acidencoding the EGFR fusion proteins described above.

An aspect of the invention is directed to a purified EGFR-SEPT14 fusionprotein comprising SEQ ID NO: 1 or 5. In one embodiment, the purifiedfusion protein is essentially free of other human proteins.

An aspect of the invention is directed to a purified EGFR-SEPT14 fusionprotein having a genomic breakpoint comprising SEQ ID NO: 4. In anotherembodiment, the purified fusion protein is essentially free of otherhuman proteins.

An aspect of the invention is directed to a purified EGFR-PSPH fusionprotein comprising SEQ ID NO: 7 or 11. In another embodiment, thepurified fusion protein is essentially free of other human proteins.

An aspect of the invention is directed to a purified EGFR-PSPH fusionprotein having a genomic breakpoint comprising SEQ ID NO: 10. In oneembodiment, the purified fusion protein is essentially free of otherhuman proteins.

An aspect of the invention is directed to a purified EGFR-CAND1 fusionprotein comprising SEQ ID NO: 13 or 8495. In one embodiment, thepurified fusion protein is essentially free of other human proteins.

An aspect of the invention is directed to a purified EGFR-CAND1 fusionprotein having a genomic breakpoint comprising SEQ ID NO: 15. In oneembodiment, the purified fusion protein is essentially free of otherhuman proteins.

An aspect of the invention is directed to a synthetic nucleic acidencoding an EGFR-SEPT14 fusion protein comprising SEQ ID NO: 2.

An aspect of the invention is directed to a synthetic nucleic acidencoding an EGFR-SEPT14 fusion protein having a genomic breakpointcomprising SEQ ID NO: 4.

An aspect of the invention is directed to a synthetic nucleic acidencoding an EGFR-PSPH fusion protein comprising SEQ ID NO: 8.

An aspect of the invention is directed to a synthetic nucleic acidencoding an EGFR-PSPH fusion protein having a genomic breakpointcomprising SEQ ID NO: 10.

An aspect of the invention is directed to a synthetic nucleic acidencoding an EGFR-CAND1 fusion protein comprising SEQ ID NO: 14.

An aspect of the invention is directed to a synthetic nucleic acidencoding an EGFR-CAND1 fusion protein having a genomic breakpointcomprising SEQ ID NO: 15.

An aspect of the invention is directed to an antibody or antigen-bindingfragment thereof that specifically binds to a purified fusion proteincomprising a tyrosine kinase domain of an EGFR protein fused to apolypeptide that constitutively activates the tyrosine kinase domain ofthe EGFR protein. In one embodiment, the fusion protein is an EGFR-SEPTfusion protein, an EGFR-PSPH fusion protein, or an EGFR-CAND fusionprotein. In another embodiment, the EGFR-SEPT fusion protein isEGFR-SEPT14. In one embodiment, the EGFR-SEPT fusion protein comprisesthe amino acid sequence of SEQ ID NO: 1, 3, or 5. In one embodiment, theEGFR-CAND fusion protein is EGFR-CAND1. In one embodiment, the EGFR-CANDfusion protein comprises the amino acid sequence of SEQ ID NO: 13, 16,or 8495.

An aspect of the invention is directed to an antibody or antigen-bindingfragment thereof that specifically binds to a purified fusion proteincomprising a tyrosine kinase domain of an EGFR protein fused to apolypeptide comprising the coiled-coil domain of a Septin protein. Inanother embodiment, the EGFR-SEPT fusion protein is EGFR-SEPT14. In oneembodiment, the EGFR-SEPT fusion protein comprises the amino acidsequence of SEQ ID NO: 1, 3, or 5.

An aspect of the invention is directed to an antibody or antigen-bindingfragment thereof that specifically binds to a purified fusion proteincomprising a tyrosine kinase domain of an EGFR protein fused to apolypeptide comprising a phosphoserine phosphatase (PSPH) protein. Inone embodiment, the EGFR-PSPH fusion protein comprises the amino acidsequence of SEQ ID NO: 7, 9, or 11.

An aspect of the invention is directed to an antibody or antigen-bindingfragment thereof, that specifically binds to a purified fusion proteincomprising a tyrosine kinase domain of an EGFR protein fused to apolypeptide comprising a Cullin-associated and neddylation-dissociated(CAND) protein. In one embodiment, the EGFR-CAND fusion protein isEGFR-CAND1. In one embodiment, the EGFR-CAND fusion protein comprisesthe amino acid sequence of SEQ ID NO: 13, 16, or 8495.

An aspect of the invention is directed to a composition for decreasingthe expression level or activity of a fusion protein in a subjectcomprising the tyrosine kinase domain of an EGFR protein fused to apolypeptide that constitutively activates the tyrosine kinase domain ofthe EGFR protein, the composition in an admixture of a pharmaceuticallyacceptable carrier comprising an inhibitor of the fusion protein. In oneembodiment, the inhibitor comprises an antibody that specifically bindsto an EGFR-SEPT fusion protein, an EGFR-PSPH fusion protein, anEGFR-CAND fusion protein, or a fragment thereof; a small molecule thatspecifically binds to an EGFR protein; an antisense RNA or antisense DNAthat decreases expression of an EGFR-SEPT fusion protein, an EGFR-PSPHfusion protein, an EGFR-CAND fusion; a siRNA that specifically targetsan EGFR-SEPT fusion gene, an EGFR-PSPH fusion gene, or an EGFR-CAND; ora combination thereof. In another embodiment, the CAND protein is CAND1.In a further embodiment, the SEPT protein is SEPT14. In someembodiments, the small molecule that specifically binds to an EGFRprotein comprises AZD4547, NVP-BGJ398, PD173074, NF449, TK1258,BIBF-1120, BMS-582664, AZD-2171, TSU68, AB1010, AP24534, E-7080,LY2874455, or a combination thereof.

An aspect of the invention is directed to a method for treating agene-fusion associated cancer in a subject in need thereof, the methodcomprising administering to the subject an effective amount of an EGFRfusion molecule inhibitor. In one embodiment, the gene-fusion associatedcancer comprises glioblastoma multiforme, breast cancer, lung cancer,prostate cancer, or colorectal carcinoma. In one embodiment, the EGFRfusion comprises an EGFR protein fused to a polypeptide thatconstitutively activates the tyrosine kinase domain of the EGFR protein.In one embodiment, the EGFR fusion protein is an EGFR-SEPT14 fusionprotein, an EGFR-PSPH fusion protein, or an EGFR-CAND1 fusion protein.In one embodiment, the inhibitor comprises an antibody that specificallybinds to an EGFR-SEPT fusion protein, an EGFR-PSPH fusion protein, anEGFR-CAND fusion protein, or a fragment thereof; a small molecule thatspecifically binds to an EGFR protein; an antisense RNA or antisense DNAthat decreases expression of an EGFR-SEPT fusion protein, an EGFR-PSPHfusion protein, an EGFR-CAND fusion; a siRNA that specifically targetsan EGFR-SEPT fusion gene, an EGFR-PSPH fusion gene, or an EGFR-CAND; ora combination thereof. In one embodiment, the small molecule thatspecifically binds to an EGFR protein comprises AZD4547, NVP-BGJ398,PD173074, NF449, TK1258, BIBF-1120, BMS-582664, AZD-2171, TSU68, AB1010,AP24534, E-7080, LY2874455, or a combination thereof.

An aspect of the invention is directed to a method of decreasing growthof a solid tumor in a subject in need thereof, the method comprisingadministering to the subject an effective amount of an EGFR fusionmolecule inhibitor, wherein the inhibitor decreases the size of thesolid tumor. In one embodiment, the subject is afflicted with agene-fusion associated cancer. In one embodiment, the gene-fusionassociated cancer comprises glioblastoma multiforme, breast cancer, lungcancer, prostate cancer, or colorectal carcinoma. In one embodiment, thesolid tumor comprises glioblastoma multiforme, breast cancer, lungcancer, prostate cancer, or colorectal carcinoma. In one embodiment, theEGFR fusion comprises an EGFR protein fused to a polypeptide thatconstitutively activates the tyrosine kinase domain of the EGFR protein.In one embodiment, the EGFR fusion protein is an EGFR-SEPT14 fusionprotein, an EGFR-PSPH fusion protein, or an EGFR-CAND1 fusion protein.In one embodiment, the inhibitor comprises an antibody that specificallybinds to an EGFR-SEPT fusion protein, an EGFR-PSPH fusion protein, anEGFR-CAND fusion protein, or a fragment thereof; a small molecule thatspecifically binds to an EGFR protein; an antisense RNA or antisense DNAthat decreases expression of an EGFR-SEPT fusion protein, an EGFR-PSPHfusion protein, an EGFR-CAND fusion; a siRNA that specifically targetsan EGFR-SEPT fusion gene, an EGFR-PSPH fusion gene, or an EGFR-CAND; ora combination thereof. In one embodiment, the small molecule thatspecifically binds to an EGFR protein comprises AZD4547, NVP-BGJ398,PD173074, NF449, TK1258, BIBF-1120, BMS-582664, AZD-2171, TSU68, AB1010,AP24534, E-7080, LY2874455, or a combination thereof.

An aspect of the invention is directed to a method of reducing cellproliferation in a subject afflicted with a gene-fusion associatedcancer, the method comprising administering to the subject an effectiveamount of an EGFR fusion molecule inhibitor, wherein the inhibitordecreases cell proliferation. In one embodiment, the gene-fusionassociated cancer comprises glioblastoma multiforme, breast cancer, lungcancer, prostate cancer, or colorectal carcinoma. In one embodiment, theEGFR fusion comprises an EGFR protein fused to a polypeptide thatconstitutively activates the tyrosine kinase domain of the EGFR protein.In one embodiment, the EGFR fusion protein is an EGFR-SEPT14 fusionprotein, an EGFR-PSPH fusion protein, or an EGFR-CAND1 fusion protein.In one embodiment, the inhibitor comprises an antibody that specificallybinds to an EGFR-SEPT fusion protein, an EGFR-PSPH fusion protein, anEGFR-CAND fusion protein, or a fragment thereof; a small molecule thatspecifically binds to an EGFR protein; an antisense RNA or antisense DNAthat decreases expression of an EGFR-SEPT fusion protein, an EGFR-PSPHfusion protein, an EGFR-CAND fusion; a siRNA that specifically targetsan EGFR-SEPT fusion gene, an EGFR-PSPH fusion gene, or an EGFR-CAND; ora combination thereof. In one embodiment, the small molecule thatspecifically binds to an EGFR protein comprises AZD4547, NVP-BGJ398,PD173074, NF449, TK1258, BIBF-1120, BMS-582664, AZD-2171, TSU68, AB1010,AP24534, E-7080, LY2874455, or a combination thereof.

An aspect of the invention is directed to a diagnostic kit fordetermining whether a sample from a subject exhibits a presence of anEGFR fusion, the kit comprising at least one oligonucleotide thatspecifically hybridizes to an EGFR fusion, or a portion thereof. In oneembodiment, the oligonucleotides comprise a set of nucleic acid primersor in situ hybridization probes. In another embodiment, theoligonucleotide comprises SEQ ID NOS 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 87, 88, 89, or a combination thereof. In a furtherembodiment, the primers prime a polymerase reaction only when an EGFRfusion is present. In some embodiments, the fusion protein is anEGFR-SEPT14 fusion protein, an EGFR-PSPH fusion protein, or anEGFR-CAND1 fusion protein. In other embodiments, the determiningcomprises gene sequencing, selective hybridization, selectiveamplification, gene expression analysis, or a combination thereof.

An aspect of the invention is directed to a diagnostic kit fordetermining whether a sample from a subject exhibits a presence of anEGFR fusion protein, the kit comprising an antibody that specificallybinds to an EGFR fusion protein comprising SEQ ID NO: 1, 3, 5, 7, 9, 11,13, 16, or 8495, wherein the antibody will recognize the protein onlywhen an EGFR fusion protein is present. In one embodiment, the fusionprotein is an EGFR-SEPT14 fusion protein, an EGFR-PSPH fusion protein,or an EGFR-CAND1 fusion protein. In one embodiment, the subject isafflicted with a gene-fusion associated cancer. In one embodiment, thegene-fusion associated cancer comprises glioblastoma multiforme, breastcancer, lung cancer, prostate cancer, or colorectal carcinoma.

An aspect of the invention is directed to a method for detecting thepresence of an EGFR fusion in a human subject. The method comprisesobtaining a biological sample from the human subject; and detectingwhether or not there is an EGFR fusion present in the subject. In oneembodiment, the detecting comprises measuring EGFR fusion protein levelsby ELISA using an antibody directed to SEQ ID NO: 1, 3, 5, 7, 9, 11, 13,16, or 8495; western blot using an antibody directed to SEQ ID NO: 1, 3,5, 7, 9, 11, 13, 16, or 8495; mass spectroscopy, isoelectric focusing,or a combination thereof.

An aspect of the invention is directed to a method for detecting thepresence of an EGFR fusion in a human subject. The method comprisesobtaining a biological sample from a human subject; and detectingwhether or not there is a nucleic acid sequence encoding an EGFR fusionprotein in the subject. In one embodiment, the nucleic acid sequencecomprises any one of SEQ ID NOS: 2, 4, 8, 10, 14, and 15. In anotherembodiment, the detecting comprises using hybridization, amplification,or sequencing techniques to detect an EGFR fusion. In a furtherembodiment, the amplification uses primers comprising SEQ ID NOS 32, 33,34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 87, 88, or 89. In someembodiments, the fusion protein is an EGFR-SEPT14 fusion protein, anEGFR-PSPH fusion protein, or an EGFR-CAND1 fusion protein.

BRIEF DESCRIPTION OF THE FIGURES

To conform to the requirements for PCT patent applications, many of thefigures presented herein are black and white representations of imagesoriginally created in color. In the below descriptions and the examples,the colored plots and images are described in terms of its appearance inblack and white. The original color versions can be viewed in Frattiniet al., (2013) Nature Genetics, 45(10):1141-49 (including theaccompanying Supplementary Information available in the on-line versionof the manuscript available on the Nature Genetics web site). For thepurposes of the PCT, the contents of Frattini et al., (2013) NatureGenetics, 45(10):1141-49, including the accompanying “SupplementaryInformation,” are herein incorporated by reference.

FIG. 1A is a chromosome view of validated GBM genes scoring at the topof each of the three categories by MutComFocal. The plot shows mutatedgenes without significant copy number alterations (Mut, mutation %,frequency of mutations). Previously known GBM genes are indicated ingreen (light grey in black and white image), new and independentlyvalidated GBM genes are indicated in red (dark grey in black and whiteimage).

FIG. 1B is a chromosome view of validated GBM genes scoring at the topof each of the three categories by MutComFocal. The plot shows mutatedgenes in regions of focal and recurrent amplifications (Amp-Mut,Amplification/mutation scores). Previously known GBM genes are indicatedin green (light grey in black and white image), new and independentlyvalidated GBM genes are indicated in red (dark grey in black and whiteimage).

FIG. 1C is a chromosome view of validated GBM genes scoring at the topof each of the three categories by MutComFocal. The plot shows mutatedgenes in regions of focal and recurrent deletions (Del-Mut,Deletion/mutation scores). Previously known GBM genes are indicated ingreen (light grey in black and white image), new and independentlyvalidated GBM genes are indicated in red (dark grey in black and whiteimage).

FIGS. 2A-B shows Localization of altered residues in LZTR-1. FIG. 2Ashows lysates from 293T cells transfected with vectors expressing LZTR-1and the Flag-Cul3 wild type (WT), Flag-Cul3-dominant negative (DN) orthe empty vector were immunoprecipitated with Flag antibody and assayedby western blot with the indicated antibodies. *, non specific band;left bracket indicates Cul3 polypeptides. The molecular weight isindicated on the right. FIG. 2B shows homology model of the Kelch(green; grey in black and white image of left hand side of ribbondiagram), BTB (cyan; (light grey in black and white image of center andright of ribbon diagram) and BACK (purple; dark grey in black and whiteimage of center and right of ribbon diagram) domains of LZTR-1 with theCul3 N-terminal domain (white) docked onto the putative binding site.GBM mutations are indicated in red (dark grey in black and white image;left hand side of ribbon diagram).

FIG. 2C. Sequence alignment of the six blades from the Kelch β-propellerdomain. Each blade contains four core β-strands, labeled a, b, c, d.Conserved residues are highlighted in gray and residues mutated in GBMare shown in red. Insertions at the end of blades 5 and 6 are indicatedin brackets. Figure discloses SEQ ID NO: 8478.

FIGS. 3A-B. Loss of C•••D• drives mesenchymal transformation of GBM. 3a, Immunofluorescence staining of human brain cortex using δ-cateninantibody (red, left panel); Nuclei are counterstained with Dapi (blue,right panel). 3 b, Immunofluorescence staining of human primary GBMincluded in tissue microarrays (TMA) using δ-catenin antibody (red);Nuclei are counterstained with Dapi (blue). A representativeδ-catenin-positive and negative tumor is shown in the left and rightpanel, respectively.

FIGS. 3C-D. Loss of C═••D• drives mesenchymal transformation of GBM. 3c, Kaplan-Meier analysis for glioma patients with low CTNND2 mRNAexpression (≤2-fold, red line) compared with the rest of glioma (blueline). 3 d, Kaplan-Meier analysis for glioma patients with low CTNND2mRNA expression (≤2-fold) and decreased CTNND2 gene copy number (≤1)(red line) compared with the rest of glioma (blue line).

FIGS. 3E-F. Loss of C•••D• drives mesenchymal transformation of GBM. 3e, Growth rate of U87 glioma cells transduced with a lentivirusexpressing δ-catenin (squares) or the empty vector (circles, average oftriplicate cultures). 3 f, Expression of mesenchymal genes in gliomacells expressing δ-catenin or the empty vector (averages of triplicatequantitative RT-PCR). All error bars are SD. *, p≤0.005; **, p≤0.001.

FIGS. 3G-H. Loss of C•••D• drives mesenchymal transformation of GBM. 3g, Immunofluorescence staining for βIII-tubulin (upper panels) and PSD95(lower panels) in glioma cells expressing δ-catenin or the empty vector.3 h, Western blot using the indicated antibodies in glioma cellsexpressing δ-catenin or the empty vector. Vinculin is shown as controlfor loading.

FIG. 4A. EGFR-SEPT14 gene fusion identified by whole transcriptomesequencing. Split reads are shown aligning on the breakpoint. Thepredicted reading frame at the breakpoint is shown at the top with EGFRsequences in blue and SEPT14 in red. The amino acid sequence (TOP) isSEQ ID NO: 1; the nucleotide sequence (bottom) is SEQ ID NO: 2.

FIG. 4B. EGFR-SEPT14 gene fusion identified by whole transcriptomesequencing. (left panel), EGFR-SEPT14-specific PCR from cDNA derivedfrom GBMs. Marker, 1 kb ladder. (right panel), Sanger sequencingchromatogram showing the reading frame at the breakpoint (SEQ ID NO: 4)and putative translation of the fusion protein (SEQ ID NO: 3) in thepositive sample.

FIG. 4C. EGFR-SEPT14 gene fusion identified by whole transcriptomesequencing. EGFR-Septin14 fusion protein sequence (SEQ ID NO: 5) andschematics. Regions corresponding to EGFR and Septin14 are shown in blue(left hand side of diagram; (grey in black and white image; sequencecomprising “MRP . . . VIQ” amino acids of SEQ ID NO: 5) and red (righthand side of diagram; light grey in black and white image; sequencecomprising “LQD . . . RKK” amino acids of SEQ ID NO: 5), respectively.The fusion joins the tyrosine kinase domain of EGFR and the Coiled-coildomain of Septin14.

FIG. 4D. EGFR-SEPT14 gene fusion identified by whole transcriptomesequencing. Genomic fusion of EGFR exon 25 with intron 9 of SEPT14. Inthe fuse mRNA exon 24 of EGFR is spliced 5′ to exon 10 of SEPT14. Solidarrows indicate the position of the fusion genome primers that generatea fusion specific PCR product in the GBM sample TCGA-27-1837.

FIG. 5A. Expression of EGFR-SEPT14 fusion promotes an aggressivephenotype and inhibition of EGFR kinase delays GBM growth in vivo.Growth rate of SNB19 glioma cells transduced with a lentivirusexpressing EGFR-SEPT14, EGFR Viii, EGFR WT or the empty vector (averageof triplicate cultures).

FIG. 5B. Expression of EGFR-SEPT14 fusion promotes an aggressivephenotype and inhibition of EGFR kinase delays GBM growth in vivo.Migration assay in SNB19 glioma cells transduced with a lentivirusexpressing EGFR-SEPT14, EGFR Viii, EGFR WT or the empty vector.

FIG. 5C. Expression of EGFR-SEPT14 fusion promotes an aggressivephenotype and inhibition of EGFR kinase delays GBM growth in vivo.Quantification of the cell covered area for the experiments shown in b(average of triplicate cultures). All error bars are SD.

FIG. 5D. Expression of EGFR-SEPT14 fusion promotes an aggressivephenotype and inhibition of EGFR kinase delays GBM growth in vivo. Invivo inhibition of tumor growth by EGFR kinase inhibitors in gliomapatient derived xenografts carrying EGFR-SEPT14 fusion but not wild typeEGFR. T-C indicates the median difference in survival between drugtreated and vehicle (control) treated mice.

FIG. 5E. Expression of EGFR-SEPT14 fusion promotes an aggressivephenotype and inhibition of EGFR kinase delays GBM growth in vivo.Kinetics of tumor growth for the same xenografts treated with Lapatinibor vehicle (control). All error bars are SD.

FIG. 6 shows the distribution of substitutions from whole exome data.

FIG. 7 . shows dinucleotide distribution in mutated sites.

FIG. 8 . Sequence alignment of selected LZTR-1 orthologs. Mutationsdetected in GBM are indicated in red above of the aligned sequences. TheLZTR-1 gene is present in most metazoans, including the spongeAmphimedon queenslandica, which is generally recognized as the mostancient surviving metazoan lineage¹⁷. LZTR-1 is also present in somenear-metazoan unicellular protists, including Capsaspora owczarzaki(included in the Figure) and the choanoflagellates Salpingoeca rosettaand Monosiga brevicollis. These opisthokonts are key organisms for thestudy of the evolution of multicellularity, differentiation andcell-cell communication in animals and help in our understanding of therole of molecular pathways in cancer¹⁸. LZTR-1 has a characteristicKelch-BTB-BACK-BTB-BACK domain architecture, and unlike theBTB-BACK-Kelch proteins⁸, there has been little, if any, duplication ofthe LZTR-1 gene since its appearance. Despite its name, LZTR-1 does notcontain a leucine zipper region. Figure discloses SEQ ID NOS 8479-8482,respectively, in order of appearance.

FIG. 9 . Sequence alignment of BTB-BACK domains. The two BTB-BACKdomains of LZTR-1 are included along with the predicted secondarystructure from HHpred⁶. The 3-box is the Cul3 binding element within theBACK domain. The secondary structure of KLHL3 (PDB ID 4HXI), KLHL11 (PDBID 4AP2) and Gigaxonin (PDB ID 3HVE) are based on the crystalstructures. The secondary structure of SPOP is based on a crystalstructure for the BTB and 3-box region (PDB ID 3HTM) and HHpredpredictions from the remainder of the BACK domain. Only the N-terminalhalf of the BACK domain from KLHL3, KLHL11 and Gigaxonin is included, asSPOP and LZTR-1 contain truncated versions of the BACK domain. Figurediscloses SEQ ID NOS 8483-8488, respectively, in order of appearance.

FIG. 10A. Pattern of somatic mutations, CNVs and expression of CTNND2 inGBM. Schematic representation of identified somatic mutations in CTNND2shown in the context of the known domain structure of the protein.Numbers refer to amino acid residues of the δ-catenin protein.

FIG. 10B. Pattern of somatic mutations, CNVs and expression of CTNND2 inGBM. Somatic deletions of CTNND2. Samples are sorted according to thefocality of CTNND2 deletion. In the red-blue scale, white corresponds tonormal (diploid) copy number, blue is deletion and red is gain.

FIG. 10C. Pattern of somatic mutations, CNVs and expression of CTNND2 inGBM. Pattern of expression of δ-catenin in the developing mouse brain(embryonic day 14.5), as determined by immunostaining. The highestlevels of δ-catenin are detected in the cortical plate (CP) thatcontains differentiating neurons. IZ, intermediate zone; VZ/SVZventricular zone/subventricular zone; LV, lateral ventricle.

FIG. 10D. Pattern of somatic mutations, CNVs and expression of CTNND2 inGBM. CTNND2 mRNA expression analysis from Atlas-TCGA samples shows thatCTNND2 is significantly down-regulated in the mesenchymal subgroup. Inthe green-red scale, black is the median, green is down-regulation andred is up-regulation.

FIG. 11A. EGFR-PSPH gene fusion identified by whole transcriptomesequencing. Split reads are shown aligning on the breakpoint. Thepredicted reading frame at the breakpoint is shown at the top with EGFRsequences in blue (grey in black and white image; encompassing “SRR . .. VIQ” amino acids and “AGT . . . CAG” nucleotides) and PSPH in red(light grey in black and white image; encompassing “DAF . . . QQV” aminoacids and “GAT . . . CAA” nucleotides). The amino acid sequence (TOP) isSEQ ID NO: 7; the nucleotide sequence (bottom) is SEQ ID NO: 8.

FIG. 11B. EGFR-PSPH gene fusion identified by whole transcriptomesequencing. (left panel), EGFR-PSHP specific PCR from cDNA derived fromGBMs. Marker, 1 kb ladder. (right panel), Sanger sequencing chromatogramshowing the reading frame (SEQ ID NO: 10) at the breakpoint and putativetranslation of the fusion protein (SEQ ID NO: 9) in the positive sample.

FIG. 11C. EGFR-PSPH gene fusion identified by whole transcriptomesequencing. EGFR-PSPH fusion protein sequence (SEQ ID NO: 11) andschematics. Regions corresponding to EGFR and PSPH are shown in blue(grey in black and white image; left hand side of schematic; sequencecomprising “MRP . . . VIQ” amino acids of SEQ ID NO: 11) and red (lightgrey in black and white image; right hand side of schematic,encompassing amino acids “DAF . . . LEE” of SEQ ID NO: 11),respectively. The fusion includes the tyrosine kinase domain of EGFR andthe last 35 amino acids of PSPH.

FIG. 12A. NFASC-NTRK1 gene fusion identified by whole transcriptomesequencing. Split reads are shown aligning on the breakpoint. Thepredicted reading frame at the breakpoint is shown at the top with NFASCsequences in blue (grey in black and white image; encompassing “RVQ . .. GED” amino acids and “AGA . . . ATT” nucleotides) and NTRK1 in red(light grey in black and white image; encompassing “YTN . . . VGL” aminoacids and “AGA . . . AAG” nucleotides). Figure discloses SEQ ID NOS8489-8490, respectively, in order of appearance.

FIG. 12B. NFASC-NTRK1 gene fusion identified by whole transcriptomesequencing. (left panel), NFASC-NTRK1 specific PCR from cDNA derivedfrom GBMs. Marker, 1 kb ladder. (right panel), Sanger sequencingchromatogram showing the reading frame (SEQ ID NO: 8491) at thebreakpoint and putative translation of the fusion protein (SEQ ID NO:8492) in the positive sample.

FIG. 12C. NFASC-NTRK1 gene fusion identified by whole transcriptomesequencing. NFASC-NTRK1 fusion protein sequence (SEQ ID NO: 8493) andschematics. Regions corresponding to NFASC and NTRK1 are shown in blue(grey in black and white image; sequence comprising “MAR . . . GED”amino acids) and red (light grey in black and white image; sequencecomprising “YTN . . . VLG” amino acids), respectively. The fusionincludes two of the five fibronectin-type III domain of neurofascin andthe protein kinase domain of NTRK1.

FIG. 12D. NFASC-NTRK1 gene fusion identified by whole transcriptomesequencing. Genomic fusion of NFASC intron 9 with intron 21 of NTRK1. Inthe fuse mRNA exon 21 of NFASC is spliced 5′ to exon 10 of NTRK1. Solidarrows indicate the position of the fusion genome primers that generatea fusion specific PCR product in the GBM sample TCGA-06-5411.

FIG. 13 shows the expression measured by read depth from RNA-seq data.Note the very high level of expression in the regions of the genesimplicated in the fusion events.

FIG. 14A. CAND1-EGFR gene fusion identified by whole transcriptomesequencing. Split reads are shown aligning on the breakpoint. Thepredicted reading frame at the breakpoint is shown at the top with CAND1sequences in blue (grey in black and white image; sequence comprising“TSA . . . LSR” amino acids of SEQ ID NO: 13 and “TTA . . . CAG”nucleotides of SEQ ID NO: 14) and EGFR in red (light grey in black andwhite image; sequence comprising “CTG . . . VGX” amino acids of SEQ IDNO: 13 and “ATC . . . GGC” nucleotides of SEQ ID NO: 14). The amino acidsequence (TOP) is SEQ ID NO: 13; the nucleotide sequence (bottom) is SEQID NO: 14.

FIG. 14B. CAND1-EGFR gene fusion identified by whole transcriptomesequencing. (left panel), CAND1-EGFR specific PCR from cDNA derived fromGBMs. Marker, 1 kb ladder. (right panel), Sanger sequencing chromatogramshowing the reading frame at the breakpoint (SEQ ID NO: 15) and putativetranslation of the fusion protein (SEQ ID NO: 16) in the positive sample(boxed sequences). Figure also discloses SEQ ID NO: 8494.

FIG. 14C. CAND1-EGFR gene fusion identified by whole transcriptomesequencing. CAND1-EGFR fusion protein sequence (SEQ ID NO: 8495).Regions corresponding to CAND1 and EGFR are shown in blue (grey in blackand white image; sequence comprising “MAS . . . LSR” amino acids of SEQID NO: 8495) and red (grey in black and white image; sequence comprising“CTG . . . IGA*” amino acids of SEQ ID NO: 8495), respectively.

FIG. 14D. CAND1-EGFR gene fusion identified by whole transcriptomesequencing. Genomic fusion of CAND1 intron 4 with intron 15 of EGFR. Inthe fuse mRNA exon 4 of CAND1 is spliced 5′ to exon 16 of EGFR.

FIG. 15 is a photographic image of a blot showing the interaction withCul3 and protein stability of wild type and mutant LZTR-1. Lysates fromSF188 glioma cells transfected with vectors expressing Myc-LZTR-1 andFlag-Cul3 or the empty vector were immunoprecipitated with Flag antibodyand assayed by western blot with the indicated antibodies. *, nonspecific band; arrowhead indicates neddylated Cul3.

FIG. 16A are a photographic images of a blot showing the interactionwith Cul3 and protein stability of wild type and mutant LZTR-1. In vitroanalysis of the interaction between Cul3 and LZTR-1 wild type and GBMrelated mutants. Left panel, In vitro translated Myc-LZTR-1 input. Rightpanel, In vitro translated Myc-LZTR-1 was mixed with Flag-Cul3immunoprecipitated from transfected HEK-293T cells. Bound proteins wereanalyzed by western blot using the indicated antibodies.

FIG. 16B is a photographic image of a blot showing the interaction withCul3 and protein stability of wild type and mutant LZTR-1. Steady stateprotein levels of wild type LZTR-1 and GBM-related mutants.

FIG. 16C is a photographic image of a blot (top) and a graph (bottom)Top panel, Cells transfected with LZTR-1 wild type or the R810W mutantwere treated with cycloexamide for the indicated time. Bottom panel,Quantification of LZTR-1 wild type and LZTR-1-R810W protein from theexperiment in the left panel.

FIG. 16D is a photographic image of a blot showing the interaction withCul3 and protein stability of wild type and mutant LZTR-1.Semi-quantitative RT-PCR evaluation of LZTR-1 wild type and LZTR-1-R810WRNA expression in cells transfected as in FIG. 16C.

FIG. 17A is a graph showing functional analysis of LZTR-1 wild type andGBM associated mutants in GBM-derived cells. GSEA shows up-regulation ofgenes associated with the phenotype of “spherical cultures” of gliomacells in primary human GBM carrying mutations in the LZTR-1 gene[Enrichment Score (ES)=0.754; P (family-wise error rate, FWER)=0.000 q(false discovery rate, FDR)=0.000].

FIG. 17B is a graph showing functional analysis of LZTR-1 wild type andGBM associated mutants in GBM-derived cells. Sphere forming assay (leftpanel) and western blot analysis (right panel) of GBM-derived gliomaspheres (#48) expressing vector or LZTR-1. Data are Mean±SD oftriplicate samples (p=0.0036). Error bars are SD.

FIG. 17C is a linear regression plot of in vitro limiting dilution assayusing GBM-derived glioma spheres #46 expressing vector or LZTR-1. Thefrequency of sphere forming cells was 8.49±1.04 and 1.44±0.05% in vectorand LZTR-1 expressing cells, respectively (p=0.00795). Each data pointrepresents the average of triplicates. Error bars are SD.

FIG. 17D is a graph and photographic microscopy images showingfunctional analysis of LZTR-1 wild type and GBM associated mutants inGBM-derived cells. Left upper panels, Bright field microphotographs ofGBM-derived line 46 cells six days after transduction with vector orLZTR-1 expressing lentivirus. Left lower panels, Bright fieldmicrophotographs of spheres from GBM-derived glioma cells #46 expressinglentivirus expressing vector or LZTR-1 from experiment in FIG. 17C.Right panel, The size of tumor spheres from cultures in c was determinedby microscopy review after 14 days of culture. n=60 spheres fromtriplicates for each condition. Data are Mean±SD (p<0.0001). Error barsare SD.

FIG. 17E is a photographic image of a western blot analysis ofGBM-derived cells #84 expressing vector or LZTR-1.

FIG. 17F is a linear regression plot of in vitro limiting dilution assayusing GBM-derived line 84 expressing vector, LZTR-1, LZTR-1-R810W orLZTR-1-W437STOP. The frequency of sphere forming cells was 7.2±0.92 forvector, 1.48±0.09 for LZTR-1 wild type (p=0.0096); 7.82±0.99 forLZTR-1-R810W (p=0.2489); and 6.74±1.07 for LZTR-1-W437STOP (p=0.2269).Error bars are SD.

FIGS. 18A-B are photographic microscopy images showing expression ofδ-catenin in neurons and δ-catenin driven loss of mesenchymal marker inGBM. FIG. 18A shows a pattern of expression of δ-catenin in thedeveloping brain, as determined by immunostaining. Doubleimmunofluorescence staining of brain cortex using δ-catenin antibody(red; dark grey in black and white image (center)) and βIII-tubulin(green; light grey in black and white image (right)); Nuclei arecounterstained with Dapi (blue; grey in black and white image (Left)).FIG. 18B shows a pattern of expression of δ-catenin in the adult brain,as determined by immunostaining. Upper panels, Double immunofluorescencestaining of brain cortex using δ-catenin antibody (red; dark grey inblack and white image (center)) and MAP2 (green; light grey in black andwhite image (right)); Nuclei are counterstained with Dapi (blue; grey inblack and white image (Left)). Lower panels, Double immunofluorescencestaining of brain cortex using δ-catenin antibody (red; dark grey inblack and white image) and GFAP (green; light grey in black and whiteimage); Nuclei are counterstained with Dapi (blue; grey in black andwhite image).

FIG. 18C is a photographic image of a western blot using the indicatedantibodies for U87 cells expressing δ-catenin wild type,glioma-associated δ-catenin mutants or the empty vector. FBN,fibronectin. Vinculin is shown as control for loading.

FIGS. 19A-B show a functional analysis of δ-catenin in mesenchymal GBM.FIG. 19A is a photographic microscopy image of immunofluorescence forfibronectin, collagen-5α1 (COL5A1) and smooth muscle actin (SMA) inglioma spheres #48 four days after infection with lentivirusesexpressing δ-catenin or the empty vector. Nuclei are counterstained withDapi. FIG. 19B is a bar graph showing the quantification of fluorescenceintensity for SMA, COL5A1 and FBN for cultures treated as in a. n=3independent experiments; data indicate mean±SD.

FIG. 19C is a bar graph showing the quantification of fluorescenceintensity for βIII-tubulin in cells #48 infected with lentivirusesexpressing CTNND2 or the empty vector.

FIG. 19D are photographic microscopy images showing time course analysisof βIII-tubulin expression in glioma spheres #48 transduced withlentiviruses expressing CTNND2 or the empty vector. Note the loss fromthe advanced culture of βIII-Tubulin expressing cells.

FIGS. 19E-F are graphs. FIG. 19E shows a linear regression plot of invitro limiting dilution assay using GBM-derived cells #48 expressingvector or δ-catenin. The frequency of sphere forming cells was 7.42±1.16and 0.88±0.02 for vector and δ-catenin, respectively (p=0.0098). Errorbars are SD. FIG. 19F shows a longitudinal analysis of bioluminescenceimaging in mice injected intracranially with GBM-derived line 48expressing vector or δ-catenin. n=3 mice for vector and 5 for δ-catenin.Data are mean±SEM of photon counts.

FIGS. 20A-E show the functional analysis of EGFR-SEPT14 fusion andeffect of inhibition of EGFR kinase on glioma growth. FIG. 20A is agraph of a sphere forming assay in the absence of EGF of GBM-derivedprimary cells (#48) expressing vector, EGFR wild type, EGFR Viii orEGFR-SEP14 fusion. Data are Mean±SD of triplicate samples (p=0.0051 and0.027 for EGFR-SEP14 fusion and EGFR Viii compared with vector,respectively). FIG. 20B is a western blot analysis of GBM-derivedprimary cells (#48) expressing vector, EGFR Viii or EGFR-SEP14 fusioncultured in the presence of EGF. FIG. 20C is a photohraphic image of ablot showing GBM-derived cells (#48) expressing vector, EGFR Viii orEGFR-SEP14 fusion that were cultured in the absence of EGF for 48 h andthen stimulated with EGF 20 ng/ml for the indicated time. Cells wereassayed by western blot using the indicated antibodies. FIG. 20D is agraph of GSEA showing up-regulation of STAT3 target genes in primaryhuman GBM carrying the EGFR-SEPT14 fusion gene [Enrichment Score(ES)=0.738; P (family-wise error rate, FWER)=0.000 q (false discoveryrate, FDR)=0.000]. FIG. 20E is a bar graph showing the survival ofGBM-derived cells (#48) expressing vector, EGFR wild type, EGFR Viii orEGFR-SEP14 fusion after treatment with lapatinib for 48 h at theindicated concentrations. Data are Mean±SD of triplicate samples.

FIG. 21 is a plot showing the number of mutations in TCGA samplesharboring MutComFocal gene candidates. For a given gene G, the number ofmutations M8 was plotted in samples harboring G as solid circles. Themean of M8 is also plotted as asterisks. Given the mean, μ and standarddeviation 6 of the number of mutations in all TCGA samples, the 95%confidence interval of a sample being hyper-mutated (11±1.96*a) wasplotted and shown that for all G, the mean of M8 falls well within the95% confidence interval, demonstrating that MutComFocal genes do nottend to occur in hypermutated samples.

FIGS. 22A-B show pattern of somatic mutations, CNVs and expression ofCTNND2 in GBM. FIG. 22A are photographic microscopy images ofimmunofluorescence staining of human primary GBM included in tissuemicroarrays (TMA) using δ-catenin antibody (red; dark grey in black andwhite image); Nuclei are counterstained with Dapi (blue; grey in blackand white image). Two representative δ-catenin-positive and twoδ-catenin-negative tumors are shown in the upper and lower panels,respectively. FIG. 22B is a Western Blot analysis of the expression ofδ-catenin in a panel of GBM-derived glioma sphere cultures. Brain,normal human brain. Arrowhead indicated δ-catenin; Asterisk,non-specific band. Vinculin is shown as control for loading.

FIGS. 23A-B show the effects of expression of δ-catenin in glioma cells.FIG. 23A is a western blot using the indicated antibodies in gliomacells expressing δ-catenin or the empty vector. Vinculin is shown ascontrol for loading. FIG. 23B are photographic microscopy images showingU87 glioma cells transduced with a lentivirus expressing wild typeδ-catenin, δ-catenin GBM-derived mutants or the empty vector wereanalyzed by fluorescence microscopy.

FIG. 23C is a bar graph that shows the effects of expression ofδ-catenin in glioma cells. The number of cells displaying neuralprocesses was scored. At least 200 cells/sample were analyzed.

FIG. 23D are photographs of longitudinal bioluminescence imaging for onerepresentative mouse injected intracranially with glioma sphere cells#48 transduced with lentivirus expressing CTNND2 (lower panels) or theempty vector (upper panels).

FIG. 24 is a heat map showing amplification surrounding the genomicneighborhood of EGFR, SEPT14, and PSPH among samples harboring EGFRfusions. Copy number was plotted log 2 ratio across the genomic regionof chr7:55000000-56500000 for samples with EGFR-PSPH (top three rows)and EGFR-SEPT14 (bottom six rows). Genomic coordinates are also plottedfor EGFR (blue; dark grey in black and white image), SEPT14 (yellow;light grey in black and white image), and PSPH (cyan; grey in black andwhite image).

FIG. 25 is a plot showing the expression of EGFR-SEPT14 fusion promotesan aggressive phenotype and inhibition of EGFR kinase delays GBM growthin vivo. Growth rate of U87 glioma cells transduced with a lentivirusexpressing EGFR-SEPT14, EGFR Viii, EGFR WT or the empty vector (averageof triplicate cultures).

FIG. 26 is a map showing differential expression of GBM tumor samplesharboring EGFR-SEPT14 fusions and EGFRvlll rearrangements. Afterfiltering for statistical significance for differential expression, tengenes remained that characterized the EGFR-SEPT14 phenotype from theEGFRvlll phenotype. Log 2 expression was plotted as a heat map. Sampleswere hierarchically clustered by Euclidean distance using averagelinkage. This clustering demonstrates clear separation betweenEGFR-SEPT14 samples (red; dark grey in black and white image;corresponding to top half of intensity bar of left hand side) andEGFRvlll samples (green; light grey in black and white image;corresponding to bottom half of intensity bar of left hand side),confirming the unique molecular signature of the EGFR-SEPT14 genefusion.

FIG. 27 shows gene fusions identified through RNA sequencing.

FIG. 28 shows genomic breakpoints of gene fusions detected throughwhole-exome DNA sequencing.

FIG. 29 shows that an EGFR fusion molecule can also include a tyrosinekinase domain of an EGFR protein fused to a protein encoded by any oneof the genes.

FIG. 30 shows relative expression of EGFR fusion and wild-typetranscripts. Expression is estimated using the depth of reads coveringthe fusion breakpoint or wild-type exon junctions excluded from thefusion transcript. These wild-type exons include exons 25-26, 26-27, and27-28.

FIG. 31 shows genomic breakpoints of gene fusions detected throughwhole-exome DNA sequencing.

FIG. 32 shows an analysis of the incidence of EGFR-SEPT14 and EGFR-PSPHgene fusions in GBM harboring or not the EGFRvIII rearrangement.

FIG. 33 shows enrichment of classical/mesenchymal subtype among sampleswith EGFR-SEPT14 or EGFR-PSPH.

FIG. 34 shows antibodies and concentrations used in immunofluorescencestaining.

FIG. 35 shows antibodies and concentrations used for Western blots andimmunoprecipitation assays.

FIG. 36 shows primers used for screening gene fusions from cDNA.

FIG. 37 shows primers used for genomic detection of gene fusions.

FIG. 38 shows primers used for semiquantitative RT-PCR to detectexogenous Myc-LZTR1 WT and mutant LZTR1-R801W

DETAILED DESCRIPTION OF THE INVENTION

Gene fusions retaining the RTK-coding domain of EGFR are the mostfrequent gene fusion events in GBM. EGFR gene fusions occur in 7.6% ofGBM patients and frequently implicate the Sept14 gene as the 3′ partnerin the fusion, with a consistent breakpoint at the RNA level. This makesthe EGFR fusions highly manageable genetic alterations bothdiagnostically and therapeutically. In one embodiment, EGFR fusionsenhance the proliferative and migratory capacity of glioma cells. Inanother embodiment, the EGFR fusions also confer sensitivity to EGFRinhibition to human GBM grown as mouse xenografts. Gene fusionsencompassing RTK-coding genes are thus implicated in the pathogenesis ofGBM and provide a strong rationale for the inclusion of GBM patientsharboring EGFR fusions in clinical trials based on EGFR inhibitors. Thetarget population of GBM patients who may carry EGFR gene fusions canbenefit from targeted inhibition of EGFR kinase activity, and isestimated to correspond to 20,000 patients per year world-wide (˜1,000in USA/year).

Glioblastoma multiformes (GBMs) are the most common form of brain tumorsin adults accounting for 12-15% of intracranial tumors and 50-60% ofprimary brain tumors. GBM is among the most lethal forms of humancancer. The history of successful targeted therapy of cancer largelycoincides with the inactivation of recurrent and oncogenic gene fusionsin hematological malignancies and recently in some types of epithelialcancer. GBM is among the most lethal and incurable forms of humancancer. Targeted therapies against common genetic alterations in GBMhave not changed the dismal clinical outcome of the disease, most likelybecause they have systematically failed to eradicate the truly addictingoncoprotein activities of GBM. Recurrent chromosomal rearrangementsresulting in the creation of oncogenic gene fusions have not been foundin GBM.

GBM is among the most difficult forms of cancer to treat in humans (1).So far, the therapeutic approaches that have been tested againstpotentially important oncogenic targets in GBM have met limited success(2-4). Recurrent chromosomal translocations leading to production ofoncogenic fusion proteins are viewed as initiating and addicting eventsin the pathogenesis of human cancer, thus providing the most desirablemolecular targets for cancer therapy (5, 6). Recurrent and oncogenicgene fusions have not been found in GBM. Chromosomal rearrangements arehallmarks of hematological malignancies but recently they have also beenuncovered in subsets of solid tumors (breast, prostate, lung andcolorectal carcinoma) (7, 8). Important and successful targetedtherapeutic interventions for patients whose tumors carry theserearrangements have stemmed from the discovery of functional genefusions, especially when the translocations involve kinase-coding genes(BCR-ABL, EML4-ALK) (9, 10). GBM, the most common malignant brain tumor,remains one of the most challenging forms of cancer to treat. Theabundance of passenger mutations and large regions of copy numberalterations has complicated the definition of the landscape of drivermutations in glioblastoma.

A hallmark of GBM is rampant chromosomal instability (CIN), which leadsto aneuploidy (11). CIN and aneuploidy are early events in thepathogenesis of cancer (12). Without being bound by theory, geneticalterations targeting mitotic fidelity might be responsible formissegregation of chromosomes during mitosis, resulting in aneuploidy(13, 14).

Epidermal growth factor receptors (EGFR) are transmembrane glycoproteinsand members of the protein kinase superfamily. This protein is areceptor for members of the epidermal growth factor family. EGFR is acell surface protein that binds to epidermal growth factor. Binding ofthe protein to a ligand induces receptor dimerization and tyrosineautophosphorylation and leads to cell proliferation. Mutations that leadto EGFR overexpression or overactivity have been associated with anumber of cancers, including lung cancer, anal cancers and glioblastomamultiforme.

Phosphoserine phosphatase (PSPH) is an enzyme responsible for the thirdand last step in L-serine formation. It catalyzes magnesium-dependenthydrolysis of L-phosphoserine and is also involved in an exchangereaction between L-serine and L-phosphoserine. Deficiency of thisprotein is thought to be linked to Williams syndrome.

The singular forms “a,” “an,” and “the” include plural reference unlessthe context clearly dictates otherwise.

The term “about” is used herein to mean approximately, in the region of,roughly, or around. When the term “about” is used in conjunction with anumerical range, it modifies that range by extending the boundariesabove and below the numerical values set forth. In general, the term“about” is used herein to modify a numerical value above and below thestated value by a variance of 20%.

DNA and AminoAcid Manipulation Methods and Purification Thereof

The practice of aspects of the present invention can employ, unlessotherwise indicated, conventional techniques of cell biology, cellculture, molecular biology, transgenic biology, microbiology,recombinant DNA, and immunology, which are within the skill of the art.Such techniques are explained fully in the literature. See, e.g.,Molecular Cloning A Laboratory Manual, 3^(rd) Ed., ed. by Sambrook(2001), Fritsch and Maniatis (Cold Spring Harbor Laboratory Press:1989); DNA Cloning, Volumes I and II (D. N. Glover ed., 1985);Oligonucleotide Synthesis (M. J. Gait ed., 1984); Mullis et al. U.S.Pat. No. 4,683,195; Nucleic Acid Hybridization (B. D. Hames & S. J.Higgins eds. 1984); Transcription and Translation (B. D. Hames & S. J.Higgins eds. 1984); Culture Of Animal Cells (R. I. Freshney, Alan R.Liss, Inc., 1987); Immobilized Cells and Enzymes (IRL Press, 1986); B.Perbal, A Practical Guide To Molecular Cloning (1984); the series,Methods In Enzymology (Academic Press, Inc., N.Y.), specifically,Methods In Enzymology, Vols. 154 and 155 (Wu et al. eds.); Gene TransferVectors For Mammalian Cells (J. H. Miller and M. P. Calos eds., 1987,Cold Spring Harbor Laboratory); Immunochemical Methods In Cell AndMolecular Biology (Caner and Walker, eds., Academic Press, London,1987); Handbook Of Experimental Immunology, Volumes I-IV (D. M. Weir andC. C. Blackwell, eds., 1986); Manipulating the Mouse Embryo, (ColdSpring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986). Allpatents, patent applications and references cited herein areincorporated by reference in their entireties.

One skilled in the art can obtain a protein in several ways, whichinclude, but are not limited to, isolating the protein via biochemicalmeans or expressing a nucleotide sequence encoding the protein ofinterest by genetic engineering methods.

A protein is encoded by a nucleic acid (including, for example, genomicDNA, complementary DNA (cDNA), synthetic DNA, as well as any form ofcorresponding RNA). For example, it can be encoded by a recombinantnucleic acid of a gene. The proteins of the invention can be obtainedfrom various sources and can be produced according to various techniquesknown in the art. For example, a nucleic acid that encodes a protein canbe obtained by screening DNA libraries, or by amplification from anatural source. A protein can be a fragment or portion thereof. Thenucleic acids encoding a protein can be produced via recombinant DNAtechnology and such recombinant nucleic acids can be prepared byconventional techniques, including chemical synthesis, geneticengineering, enzymatic techniques, or a combination thereof. Forexample, a fusion protein of the invention comprises a tyrosine kinasedomain of an EGFR protein fused to a polypeptide that constitutivelyactivates the tyrosine kinase domain of the EGFR protein. For example,the fusion protein can be an EGFR-SEPT fusion protein, an EGFR-PSPHfusion protein, or an EGFR-CAND fusion protein. An example of anEGFR-SEPT fusion protein is EGFR-SEPT14. In one embodiment, anEGFR-SEPT14 fusion polypeptide can have the amino acid sequence shown inSEQ ID NO: 1, 3, or 5. An example of an EGFR-PSPH fusion protein is apolypeptide having the amino acid sequence shown in SEQ ID NO: 7, 9, or11. An example of an EGFR-CAND fusion protein is EGFR-CAND1. In oneembodiment, an EGFR-CAND1 fusion polypeptide can have the amino acidsequence shown in SEQ ID NO: 13, 16, or 8495.

The Genbank ID for the EGFR gene is 1956. Four isoforms are listed forEGFR, e.g., having Genebank Accession Nos. NP_005219 (correspondingnucleotide sequence NM_005228); NP_958439 (corresponding nucleotidesequence NM_201282); NP_958440 (corresponding nucleotide sequenceNM_201283); NP_958441 (corresponding nucleotide sequence NM_201284). Thenucleotide and amino acid sequences can be readily obtained by one ofordinary skill in the art using the listed accession numbers.

The Genbank ID for the SEPT14 gene is 346288. The Genebank Accession No.for SEPT14 is NP_997249 (corresponding nucleotide sequence NM_207366).The nucleotide and amino acid sequences can be readily obtained by oneof ordinary skill in the art using the listed accession numbers.

The Genbank ID for the PSPH gene is 5723. The Genebank Accession No. forPSPH is NP_004568 (corresponding nucleotide sequence NM_004577). Thenucleotide and amino acid sequences can be readily obtained by one ofordinary skill in the art using the listed accession numbers.

The Genbank ID for the CAND1 gene is 55832. The Genebank Accession No.for CAND1 is NP_060918 (corresponding nucleotide sequence NM_018448).The nucleotide and amino acid sequences can be readily obtained by oneof ordinary skill in the art using the listed accession numbers.

As used herein, an “EGFR fusion molecule” can be a nucleic acid whichencodes a polypeptide corresponding to a fusion protein comprising atyrosine kinase domain of an EGFR protein fused to a polypeptide thatconstitutively activates the tyrosine kinase domain of the EGFR protein.For example, an EGFR fusion molecule can include an EGFR-SEPT fusion(e.g., an EGFR-SEPT14 fusion polypeptide comprising the amino acidsequence shown in SEQ ID NO: 1, 3, or 5, or comprising the nucleic acidsequence shown in SEQ ID NO: 2 or 4); an EGFR-PSPH fusion, (e.g.,comprising the amino acid sequence shown in SEQ ID NO: 7, 9, or 11, orcomprising the nucleic acid sequence shown in SEQ ID NO: 8 or 10), or anEGFR-CAND fusion (e.g., an EGFR-CAND1 fusion polypeptide comprising theamino acid sequence shown in SEQ ID NO: 13, 16, or 8495, or comprisingthe nucleic acid sequence shown in SEQ ID NO: 14 or 15). For example, anEGFR fusion molecule can include an EGFR-containing fusion comprisingthe amino acid sequence corresponding to Genebank Accession no.NP_005219, NP_958439, NP_958440, or NP_958441. AN EGFR fusion moleculecan also include a tyrosine kinase domain of an EGFR protein fused to aprotein encoded by any one of the genes listed in FIG. 29 . AN EGFRfusion molecule can include a variant of the above described examples,such as a fragment thereof.

The nucleic acid can be any type of nucleic acid, including genomic DNA,complementary DNA (cDNA), recombinant DNA, synthetic or semi-syntheticDNA, as well as any form of corresponding RNA. A cDNA is a form of DNAartificially synthesized from a messenger RNA template and is used toproduce gene clones. A synthetic DNA is free of modifications that canbe found in cellular nucleic acids and include, but are not limited to,histones and methylation. For example, a nucleic acid encoding an anEGFR EGFR fusion molecule can comprise a recombinant nucleic acidencoding such a protein. The nucleic acid can be a non-naturallyoccurring nucleic acid created artificially (such as by assembling,cutting, ligating or amplifying sequences). It can be double-stranded orsingle-stranded.

The invention further provides for nucleic acids that are complementaryto an EGFR fusion molecule. Complementary nucleic acids can hybridize tothe nucleic acid sequence described above under stringent hybridizationconditions. Non-limiting examples of stringent hybridization conditionsinclude temperatures above 30° C., above 35° C., in excess of 42° C.,and/or salinity of less than about 500 mM, or less than 200 mM.Hybridization conditions can be adjusted by the skilled artisan viamodifying the temperature, salinity and/or the concentration of otherreagents such as SDS or SSC.

According to the invention, protein variants can include amino acidsequence modifications. For example, amino acid sequence modificationsfall into one or more of three classes: substitutional, insertional ordeletional variants. Insertions can include amino and/or carboxylterminal fusions as well as intrasequence insertions of single ormultiple amino acid residues. Insertions ordinarily will be smallerinsertions than those of amino or carboxyl terminal fusions, forexample, on the order of one to four residues. Deletions arecharacterized by the removal of one or more amino acid residues from theprotein sequence. These variants ordinarily are prepared bysite-specific mutagenesis of nucleotides in the DNA encoding theprotein, thereby producing DNA encoding the variant, and thereafterexpressing the DNA in recombinant cell culture.

In one embodiment, an EGFR fusion molecule comprises a protein orpolypeptide encoded by a nucleic acid sequence encoding an EGFR fusionmolecule, such as the sequences shown in SEQ ID NOS: 2, 4, 8, 10, 14, or15. In some embodiments, the nucleic acid sequence encoding an EGFRfusion molecule is about 70%, about 75%, about 80%, about 85%, about90%, about 93%, about 95%, about 97%, about 98%, or about 99% identicalto SEQ ID NOS: 2, 4, 8, 10, 14, or 15. In another embodiment, thepolypeptide can be modified, such as by glycosylations and/oracetylations and/or chemical reaction or coupling, and can contain oneor several non-natural or synthetic amino acids. An example of an EGFRfusion molecule is the polypeptide having the amino acid sequence shownin SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495. In some embodiments,the EGFR fusion molecule that is a polypeptide is about 70%, about 75%,about 80%, about 85%, about 90%, about 93%, about 95%, about 97%, about98%, or about 99% identical to SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or8495. In another embodiment, an EGFR fusion molecule can be a fragmentof an EGFR fusion protein. For example, the EGFR fusion molecule canencompass any portion of at least about 8 consecutive amino acids of SEQID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495. The fragment can comprise atleast about 10 amino acids, a least about 20 amino acids, at least about30 amino acids, at least about 40 amino acids, at least about 50 aminoacids, at least about 60 amino acids, or at least about 75 amino acidsof SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495. Fragments include allpossible amino acid lengths between about 8 and about 100 amino acids,for example, lengths between about 10 and about 100 amino acids, betweenabout 15 and about 100 amino acids, between about 20 and about 100 aminoacids, between about 35 and about 100 amino acids, between about 40 andabout 100 amino acids, between about 50 and about 100 amino acids,between about 70 and about 100 amino acids, between about 75 and about100 amino acids, or between about 80 and about 100 amino acids.Fragments include all possible amino acid lengths between about 100 and800 amino acids, for example, lengths between about 125 and 800 aminoacids, between about 150 and 800 amino acids, between about 175 and 800amino acids, between about 200 and 800 amino acids, between about 225and 800 amino acids, between about 250 and 800 amino acids, betweenabout 275 and 800 amino acids, between about 300 and 800 amino acids,between about 325 and 800 amino acids, between about 350 and 800 aminoacids, between about 375 and 800 amino acids, between about 400 and 800amino acids, between about 425 and 800 amino acids, between about 450and 800 amino acids, between about 475 and 800 amino acids, betweenabout 500 and 800 amino acids, between about 525 and 800 amino acids,between about 550 and 800 amino acids, between about 575 and 800 aminoacids, between about 600 and 800 amino acids, between about 625 and 800amino acids, between about 650 and 800 amino acids, between about 675and 800 amino acids, between about 700 and 800 amino acids, betweenabout 725 and 800 amino acids, between about 750 and 800 amino acids, orbetween about 775 and 800 amino acids.

Chemical Synthesis. Nucleic acid sequences encoding an EGFR fusionmolecule can be synthesized, in whole or in part, using chemical methodsknown in the art. Alternatively, a polypeptide can be produced usingchemical methods to synthesize its amino acid sequence, such as bydirect peptide synthesis using solid-phase techniques. Protein synthesiscan either be performed using manual techniques or by automation.Automated synthesis can be achieved, for example, using AppliedBiosystems 431A Peptide Synthesizer (Perkin Elmer).

Optionally, polypeptides fragments can be separately synthesized andcombined using chemical methods to produce a full-length molecule. Forexample, these methods can be utilized to synthesize a fusion protein ofthe invention. In one embodiment, a fusion protein of the inventioncomprises a tyrosine kinase domain of an EGFR protein fused to apolypeptide that constitutively activates the tyrosine kinase domain ofthe EGFR protein. For example, the fusion protein can be an EGFR-SEPTfusion protein, an EGFR-PSPH fusion protein, or an EGFR-CAND fusionprotein. An example of an EGFR-SEPT fusion protein is EGFR-SEPT14. Inone embodiment, an EGFR-SEPT14 fusion polypeptide can have the aminoacid sequence shown in SEQ ID NO: 1, 3, or 5. An example of an EGFR-PSPHfusion protein is a polypeptide having the amino acid sequence shown inSEQ ID NO: 7, 9, or 11. An example of an EGFR-CAND fusion protein isEGFR-CAND1. In one embodiment, an EGFR-CAND1 fusion polypeptide can havethe amino acid sequence shown in SEQ ID NO: 13, 16, or 8495.

Obtaining, Purifying and Detecting EGFR fusion molecules. A polypeptideencoded by a nucleic acid, such as a nucleic acid encoding an EGFRfusion molecule, or a variant thereof, can be obtained by purificationfrom human cells expressing a protein or polypeptide encoded by such anucleic acid. Non-limiting purification methods include size exclusionchromatography, ammonium sulfate fractionation, ion exchangechromatography, affinity chromatography, and preparative gelelectrophoresis.

A synthetic polypeptide can be substantially purified via highperformance liquid chromatography (HPLC), such as ion exchangechromatography (IEX-HPLC). The composition of a synthetic polypeptide,such as an EGFR fusion molecule, can be confirmed by amino acid analysisor sequencing.

Other constructions can also be used to join a nucleic acid sequenceencoding a polypeptide/protein of the claimed invention to a nucleotidesequence encoding a polypeptide domain which will facilitatepurification of soluble proteins. Such purification facilitating domainsinclude, but are not limited to, metal chelating peptides such ashistidine-tryptophan modules that allow purification on immobilizedmetals, protein A domains that allow purification on immobilizedimmunoglobulin, and the domain utilized in the FLAGS extension/affinitypurification system (Immunex Corp., Seattle, Wash.). Including cleavablelinker sequences (i.e., those specific for Factor Xa or enterokinase(Invitrogen, San Diego, Calif)) between the purification domain and apolypeptide encoded by a nucleic acid of the invention also can be usedto facilitate purification. For example, the skilled artisan can use anexpression vector encoding 6 histidine residues that precede athioredoxin or an enterokinase cleavage site in conjunction with anucleic acid of interest. The histidine residues facilitate purificationby immobilized metal ion affinity chromatography, while the enterokinasecleavage site provides a means for purifying the polypeptide encoded by,for example, an EGFR-SEPT, EGFR-CAND, EGFR-PSPH, or EGFR-containing,nucleic acid.

Host cells which contain a nucleic acid encoding an EGFR fusionmolecule, and which subsequently express the same, can be identified byvarious procedures known to those of skill in the art. These proceduresinclude, but are not limited to, DNA-DNA or DNA-RNA hybridizations andprotein bioassay or immunoassay techniques which include membrane,solution, or chip-based technologies for the detection and/orquantification of nucleic acid or protein. For example, the presence ofa nucleic acid encoding an EGFR fusion molecule can be detected byDNA-DNA or DNA-RNA hybridization or amplification using probes orfragments of nucleic acids encoding the same. In one embodiment, anucleic acid fragment of an EGFR fusion molecule can encompass anyportion of at least about 8 consecutive nucleotides of SEQ ID NOS: 2, 8,or 14. In another embodiment, the fragment can comprise at least about10 consecutive nucleotides, at least about 15 consecutive nucleotides,at least about 20 consecutive nucleotides, or at least about 30consecutive nucleotides of SEQ ID NOS: 2, 8, or 14. Fragments caninclude all possible nucleotide lengths between about 8 and about 100nucleotides, for example, lengths between about 15 and about 100nucleotides, or between about 20 and about 100 nucleotides. Nucleic acidamplification-based assays involve the use of oligonucleotides selectedfrom sequences encoding an EGFR fusion molecule nucleic acid, or EGFRfusion molecule nucleic acid to detect transformants which contain anucleic acid encoding a protein or polypeptide of the same.

Protocols are known in the art for detecting and measuring theexpression of a polypeptide encoded by a nucleic acid, such as a nucleicacid encoding an EGFR fusion molecule, using either polyclonal ormonoclonal antibodies specific for the polypeptide. Non-limitingexamples include enzyme-linked immunosorbent assay (ELISA),radioimmunoassay (RIA), and fluorescence activated cell sorting (FACS).A two-site, monoclonal-based immunoassay using monoclonal antibodiesreactive to two non-interfering epitopes on a polypeptide encoded by anucleic acid, such as a nucleic acid encoding an EGFR fusion molecule,can be used, or a competitive binding assay can be employed.

Labeling and conjugation techniques are known by those skilled in theart and can be used in various nucleic acid and amino acid assays.Methods for producing labeled hybridization or PCR probes for detectingsequences related to nucleic acid sequences encoding a protein, such asEGFR fusion molecule, include, but are not limited to, oligolabeling,nick translation, end-labeling, or PCR amplification using a labelednucleotide. Alternatively, nucleic acid sequences, such as nucleic acidsencoding an EGFR fusion molecule, can be cloned into a vector for theproduction of an mRNA probe. Such vectors are known in the art, arecommercially available, and can be used to synthesize RNA probes invitro by addition of labeled nucleotides and an appropriate RNApolymerase such as T7, T3, or SP6. These procedures can be conductedusing a variety of commercially available kits (Amersham PharmaciaBiotech, Promega, and US Biochemical). Suitable reporter molecules orlabels which can be used for ease of detection include radionuclides,enzymes, and fluorescent, chemiluminescent, or chromogenic agents, aswell as substrates, cofactors, inhibitors, and/or magnetic particles.

A fragment can be a fragment of a protein, such as an EGFR fusionprotein. For example, a fragment of an EGFR fusion can encompass anyportion of at least about 8 consecutive amino acids of SEQ ID NOS: 1, 3,5, 7, 9, 11, 13, 16, or 8495. The fragment can comprise at least about10 consecutive amino acids, at least about 20 consecutive amino acids,at least about 30 consecutive amino acids, at least about 40 consecutiveamino acids, a least about 50 consecutive amino acids, at least about 60consecutive amino acids, at least about 70 consecutive amino acids, atleast about 75 consecutive amino acids, at least about 80 consecutiveamino acids, at least about 85 consecutive amino acids, at least about90 consecutive amino acids, at least about 95 consecutive amino acids,at least about 100 consecutive amino acids, at least about 200consecutive amino acids, at least about 300 consecutive amino acids, atleast about 400 consecutive amino acids, at least about 500 consecutiveamino acids, at least about 600 consecutive amino acids, at least about700 consecutive amino acids, or at least about 800 consecutive aminoacids of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495. Fragmentsinclude all possible amino acid lengths between about 8 and 100 aboutamino acids, for example, lengths between about 10 and about 100 aminoacids, between about 15 and about 100 amino acids, between about 20 andabout 100 amino acids, between about 35 and about 100 amino acids,between about 40 and about 100 amino acids, between about 50 and about100 amino acids, between about 70 and about 100 amino acids, betweenabout 75 and about 100 amino acids, or between about 80 and about 100amino acids.

Cell Transfection

Host cells transformed with a nucleic acid sequence of interest can becultured under conditions suitable for the expression and recovery ofthe protein from cell culture. The polypeptide produced by a transformedcell can be secreted or contained intracellularly depending on thesequence and/or the vector used. Expression vectors containing a nucleicacid sequence, such as a nucleic acid encoding an EGFR fusion molecule,can be designed to contain signal sequences which direct secretion ofsoluble polypeptide molecules encoded by the nucleic acid. Celltransfection and culturing methods are described in more detail below.

A eukaryotic expression vector can be used to transfect cells in orderto produce proteins encoded by nucleotide sequences of the vector, e.g.those encoding an EGFR fusion molecule. Mammalian cells can contain anexpression vector (for example, one that contains a nucleic acidencoding a fusion protein comprising a tyrosine kinase domain of an EGFRprotein fused to a polypeptide that constitutively activates thetyrosine kinase domain of the EGFR protein) via introducing theexpression vector into an appropriate host cell via methods known in theart.

A host cell strain can be chosen for its ability to modulate theexpression of the inserted sequences or to process the expressedpolypeptide encoded by a nucleic acid, in the desired fashion. Suchmodifications of the polypeptide include, but are not limited to,acetylation, carboxylation, glycosylation, phosphorylation, lipidation,and acylation. Post-translational processing which cleaves a “prepro”form of the polypeptide also can be used to facilitate correctinsertion, folding and/or function. Different host cells which havespecific cellular machinery and characteristic mechanisms forpost-translational activities (e.g., CHO, HeLa, MDCK, HEK293, and WI38),are available from the American Type Culture Collection (ATCC; 10801University Boulevard, Manassas, Va. 20110-2209) and can be chosen toensure the correct modification and processing of the foreign protein.

An exogenous nucleic acid can be introduced into a cell via a variety oftechniques known in the art, such as lipofection, microinjection,calcium phosphate or calcium chloride precipitation,DEAE-dextran-mediated transfection, or electroporation. Electroporationis carried out at approximate voltage and capacitance to result in entryof the DNA construct(s) into cells of interest (such as glioma cells(cell line SF188), neuroblastoma cells (cell lines IMR-32, SK-N-SH, SH-Fand SH-N), astrocytes and the like). Other transfection methods alsoinclude modified calcium phosphate precipitation, polybreneprecipitation, liposome fusion, and receptor-mediated gene delivery.

Cells that will be genetically engineered can be primary and secondarycells obtained from various tissues, and include cell types which can bemaintained and propagated in culture. Non-limiting examples of primaryand secondary cells include epithelial cells, neural cells, endothelialcells, glial cells, fibroblasts, muscle cells (such as myoblasts)keratinocytes, formed elements of the blood (e.g., lymphocytes, bonemarrow cells), and precursors of these somatic cell types.

Vertebrate tissue can be obtained by methods known to one skilled in theart, such a punch biopsy or other surgical methods of obtaining a tissuesource of the primary cell type of interest. In one embodiment, a punchbiopsy or removal (e.g., by aspiration) can be used to obtain a sourceof cancer cells (for example, glioma cells, neuroblastoma cells, and thelike). A mixture of primary cells can be obtained from the tissue, usingmethods readily practiced in the art, such as explanting or enzymaticdigestion (for examples using enzymes such as pronase, trypsin,collagenase, elastase dispase, and chymotrypsin). Biopsy methods havealso been described in U.S. Pat. No. 7,419,661 and PCT applicationpublication WO 2001/32840, and each are hereby incorporated byreference.

Primary cells can be acquired from the individual to whom thegenetically engineered primary or secondary cells are administered.However, primary cells can also be obtained from a donor, other than therecipient, of the same species. The cells can also be obtained fromanother species (for example, rabbit, cat, mouse, rat, sheep, goat, dog,horse, cow, bird, or pig). Primary cells can also include cells from anisolated or purified vertebrate tissue source grown attached to a tissueculture substrate (for example, flask or dish) or grown in a suspension;cells present in an explant derived from tissue; both of theaforementioned cell types plated for the first time; and cell culturesuspensions derived from these plated cells. Secondary cells can beplated primary cells that are removed from the culture substrate andreplated, or passaged, in addition to cells from the subsequentpassages. Secondary cells can be passaged one or more times. Theseprimary or secondary cells can contain expression vectors having a genethat encodes an EGFR fusion molecule.

Cell Culturing

Various culturing parameters can be used with respect to the host cellbeing cultured. Appropriate culture conditions for mammalian cells arewell known in the art (Cleveland W L, et al., J Immunol Methods, 1983,56(2): 221-234) or can be determined by the skilled artisan (see, forexample, Animal Cell Culture: A Practical Approach 2nd Ed., Rickwood, D.and Hames, B. D., eds. (Oxford University Press: New York, 1992)). Cellculturing conditions can vary according to the type of host cellselected. Commercially available medium can be utilized. Non-limitingexamples of medium include, for example, Minimal Essential Medium (MEM,Sigma, St. Louis, Mo.); Dulbecco's Modified Eagles Medium (DMEM, Sigma);Ham's F10 Medium (Sigma); HyClone cell culture medium (HyClone, Logan,Utah); RPMI-1640 Medium (Sigma); and chemically-defined (CD) media,which are formulated for various cell types, e.g., CD-CHO Medium(Invitrogen, Carlsbad, Calif).

The cell culture media can be supplemented as necessary withsupplementary components or ingredients, including optional components,in appropriate concentrations or amounts, as necessary or desired. Cellculture medium solutions provide at least one component from one or moreof the following categories: (1) an energy source, usually in the formof a carbohydrate such as glucose; (2) all essential amino acids, andusually the basic set of twenty amino acids plus cysteine; (3) vitaminsand/or other organic compounds required at low concentrations; (4) freefatty acids or lipids, for example linoleic acid; and (5) traceelements, where trace elements are defined as inorganic compounds ornaturally occurring elements that can be required at very lowconcentrations, usually in the micromolar range.

The medium also can be supplemented electively with one or morecomponents from any of the following categories: (1) salts, for example,magnesium, calcium, and phosphate; (2) hormones and other growth factorssuch as, serum, insulin, transferrin, and epidermal growth factor; (3)protein and tissue hydrolysates, for example peptone or peptone mixtureswhich can be obtained from purified gelatin, plant material, or animalbyproducts; (4) nucleosides and bases such as, adenosine, thymidine, andhypoxanthine; (5) buffers, such as HEPES; (6) antibiotics, such asgentamycin or ampicillin; (7) cell protective agents, for examplepluronic polyol; and (8) galactose. In one embodiment, soluble factorscan be added to the culturing medium.

The mammalian cell culture that can be used with the present inventionis prepared in a medium suitable for the type of cell being cultured. Inone embodiment, the cell culture medium can be any one of thosepreviously discussed (for example, MEM) that is supplemented with serumfrom a mammalian source (for example, fetal bovine serum (FBS)). Inanother embodiment, the medium can be a conditioned medium to sustainthe growth of host cells.

Three-dimensional cultures can be formed from agar (such as Gey's Agar),hydrogels (such as matrigel, agarose, and the like; Lee et al., (2004)Biomaterials 25: 2461-2466) or polymers that are cross-linked. Thesepolymers can comprise natural polymers and their derivatives, syntheticpolymers and their derivatives, or a combination thereof. Naturalpolymers can be anionic polymers, cationic polymers, amphipathicpolymers, or neutral polymers. Non-limiting examples of anionic polymerscan include hyaluronic acid, alginic acid (alginate), carageenan,chondroitin sulfate, dextran sulfate, and pectin. Some examples ofcationic polymers include but are not limited to, chitosan orpolylysine. (Peppas et al., (2006) Adv Mater. 18: 1345-60; Hoffman, A.S., (2002) Adv Drug Deliv Rev. 43: 3-12; Hoffman, A. S., (2001) Ann NYAcad Sci 944: 62-73). Examples of amphipathic polymers can include, butare not limited to collagen, gelatin, fibrin, and carboxymethyl chitin.Non-limiting examples of neutral polymers can include dextran, agarose,or pullulan. (Peppas et al., (2006) Adv Mater. 18: 1345-60; Hoffman, A.S., (2002) Adv Drug Deliv Rev. 43: 3-12; Hoffman, A. S., (2001) Ann NYAcad Sci 944: 62-73).

Cells to be cultured can harbor introduced expression vectors, such asplasmids. The expression vector constructs can be introduced viatransformation, microinjection, transfection, lipofection,electroporation, or infection. The expression vectors can contain codingsequences, or portions thereof, encoding the proteins for expression andproduction. Expression vectors containing sequences encoding theproduced proteins and polypeptides, as well as the appropriatetranscriptional and translational control elements, can be generatedusing methods well known to and practiced by those skilled in the art.These methods include synthetic techniques, in vitro recombinant DNAtechniques, and in vivo genetic recombination which are described in J.Sambrook et al., 2001, Molecular Cloning, A Laboratory Manual, ColdSpring Harbor Press, Plainview, N.Y. and in F. M. Ausubel et al., 1989,Current Protocols in Molecular Biology, John Wiley & Sons, New York,N.Y.

EGFR Fusion Molecule Inhibitors

The invention provides methods for use of compounds that decrease theexpression level or activity of an EFGR EGFR fusion molecule in asubject. In addition, the invention provides methods for using compoundsfor the treatment of a gene-fusion associated cancer. In one embodiment,the gene-fusion associated cancer comprises glioblastoma multiforme,breast cancer, lung cancer, prostate cancer, or colorectal carcinoma.

As used herein, an “EGFR fusion molecule inhibitor” refers to a compoundthat interacts with an EGFR fusion molecule of the invention andmodulates its activity and/or its expression. For example, the compoundcan decrease the activity or expression of an EGFR fusion molecule. Thecompound can be an antagonist of an EGFR fusion molecule (e.g., an EGFRfusion molecule inhibitor). Some non-limiting examples of EGFR fusionmolecule inhibitors include peptides (such as peptide fragmentscomprising an EGFR fusion molecule, or antibodies or fragments thereof),small molecules, and nucleic acids (such as siRNA or antisense RNAspecific for a nucleic acid comprising an EGFR fusion molecule).Antagonists of an EGFR fusion molecule decrease the amount or theduration of the activity of an EGFR fusion protein. In one embodiment,the fusion protein comprises a tyrosine kinase domain of an EGFR proteinfused to a polypeptide that constitutively activates the tyrosine kinasedomain of the EGFR protein (e.g., EGFR-SEPT (such as EFGR-SEPT14),EGFR-PSPH, or EGFR-CAND (such as EGFR-CAND1)). Antagonists includeproteins, nucleic acids, antibodies, small molecules, or any othermolecule which decrease the activity of an EGFR fusion molecule.

The term “modulate,” as it appears herein, refers to a change in theactivity or expression of an EGFR fusion molecule. For example,modulation can cause a decrease in protein activity, bindingcharacteristics, or any other biological, functional, or immunologicalproperties of an EGFR fusion molecule, such as an EGFR fusion protein.

In one embodiment, an EGFR fusion molecule inhibitor can be a peptidefragment of an EGFR fusion protein that binds to the protein itself.

For example, the EGFR fusion polypeptide can encompass any portion of atleast about 8 consecutive amino acids of SEQ ID NOS: 1, 3, 5, 7, 9, 11,13, 16, or 8495. The fragment can comprise at least about 10 consecutiveamino acids, at least about 20 consecutive amino acids, at least about30 consecutive amino acids, at least about 40 consecutive amino acids, aleast about 50 consecutive amino acids, at least about 60 consecutiveamino acids, at least about 70 consecutive amino acids, at least about75 consecutive amino acids, at least about 80 consecutive amino acids,at least about 85 consecutive amino acids, at least about 90 consecutiveamino acids, at least about 95 consecutive amino acids, at least about100 consecutive amino acids, at least about 200 consecutive amino acids,at least about 300 consecutive amino acids, at least about 400consecutive amino acids, at least about 500 consecutive amino acids, atleast about 600 consecutive amino acids, at least about 700 consecutiveamino acids, or at least about 800 consecutive amino acids of SEQ IDNOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495. Fragments include all possibleamino acid lengths between about 8 and 100 about amino acids, forexample, lengths between about 10 and about 100 amino acids, betweenabout 15 and about 100 amino acids, between about 20 and about 100 aminoacids, between about 35 and about 100 amino acids, between about 40 andabout 100 amino acids, between about 50 and about 100 amino acids,between about 70 and about 100 amino acids, between about 75 and about100 amino acids, or between about 80 and about 100 amino acids. Thesepeptide fragments can be obtained commercially or synthesized via liquidphase or solid phase synthesis methods (Atherton et al., (1989) SolidPhase Peptide Synthesis: a Practical Approach. IRL Press, Oxford,England). The EGFR fusion peptide fragments can be isolated from anatural source, genetically engineered, or chemically prepared. Thesemethods are well known in the art.

An EGFR fusion molecule inhibitor can be a protein, such as an antibody(monoclonal, polyclonal, humanized, chimeric, or fully human), or abinding fragment thereof, directed against an EGFR fusion molecule. Anantibody fragment can be a form of an antibody other than thefull-length form and includes portions or components that exist withinfull-length antibodies, in addition to antibody fragments that have beenengineered. Antibody fragments can include, but are not limited to,single chain Fv (scFv), diabodies, Fv, and (Fab′)₂, triabodies, Fc, Fab,CDR1, CDR2, CDR3, combinations of CDR's, variable regions, tetrabodies,bifunctional hybrid antibodies, framework regions, constant regions, andthe like (see, Maynard et al., (2000) Ann. Rev. Biomed. Eng. 2:339-76;Hudson (1998) Curr. Opin. Biotechnol. 9:395-402). Antibodies can beobtained commercially, custom generated, or synthesized against anantigen of interest according to methods established in the art (seeU.S. Pat. Nos. 6,914,128, 5,780,597, and 5,811,523; Roland E. Kontermannand Stefan Dübel (editors), Antibody Engineering, Vol. I & II, (2010)2^(nd) ed., Springer; Antony S. Dimitrov (editor), TherapeuticAntibodies: Methods and Protocols (Methods in Molecular Biology),(2009), Humana Press; Benny Lo (editor) Antibody Engineering: Methodsand Protocols (Methods in Molecular Biology), (2004) Humana Press, eachof which are hereby incorporated by reference in their entireties). Forexample, antibodies directed to an EGFR fusion molecule can be obtainedcommercially from Abcam, Santa Cruz Biotechnology, Abgent, R&D Systems,Novus Biologicals, etc. Human antibodies directed to an EGFR fusionmolecule (such as monoclonal, humanized, fully human, or chimericantibodies) can be useful antibody therapeutics for use in humans. Inone embodiment, an antibody or binding fragment thereof is directedagainst SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495.

Inhibition of RNA encoding an EGFR fusion molecule can effectivelymodulate the expression of an EGFR fusion molecule. Inhibitors areselected from the group comprising: siRNA; interfering RNA or RNAi;dsRNA; RNA Polymerase III transcribed DNAs; ribozymes; and antisensenucleic acids, which can be RNA, DNA, or an artificial nucleic acid.

Antisense oligonucleotides, including antisense DNA, RNA, and DNA/RNAmolecules, act to directly block the translation of mRNA by binding totargeted mRNA and preventing protein translation. For example, antisenseoligonucleotides of at least about 15 bases and complementary to uniqueregions of the DNA sequence encoding an EGFR fusion molecule can besynthesized, e.g., by conventional phosphodiester techniques (Dallas etal., (2006) Med. Sci. Monit. 12(4):RA67-74; Kalota et al., (2006) Handb.Exp. Pharmacol. 173:173-96; Lutzelburger et al., (2006) Handb. Exp.Pharmacol. 173:243-59). Antisense nucleotide sequences include, but arenot limited to: morpholinos, 2′-O-methyl polynucleotides, DNA, RNA andthe like.

siRNA comprises a double stranded structure containing from about 15 toabout 50 base pairs, for example from about 21 to about 25 base pairs,and having a nucleotide sequence identical or nearly identical to anexpressed target gene or RNA within the cell. The siRNA comprise a senseRNA strand and a complementary antisense RNA strand annealed together bystandard Watson-Crick base-pairing interactions. The sense strandcomprises a nucleic acid sequence which is substantially identical to anucleic acid sequence contained within the target miRNA molecule.“Substantially identical” to a target sequence contained within thetarget mRNA refers to a nucleic acid sequence that differs from thetarget sequence by about 3% or less. The sense and antisense strands ofthe siRNA can comprise two complementary, single-stranded RNA molecules,or can comprise a single molecule in which two complementary portionsare base-paired and are covalently linked by a single-stranded “hairpin”area. See also, McMnaus and Sharp (2002) Nat Rev Genetics, 3:737-47, andSen and Blau (2006) FASEB J., 20:1293-99, the entire disclosures ofwhich are herein incorporated by reference.

The siRNA can be altered RNA that differs from naturally-occurring RNAby the addition, deletion, substitution and/or alteration of one or morenucleotides. Such alterations can include addition of non-nucleotidematerial, such as to the end(s) of the siRNA or to one or more internalnucleotides of the siRNA, or modifications that make the siRNA resistantto nuclease digestion, or the substitution of one or more nucleotides inthe siRNA with deoxyribo-nucleotides. One or both strands of the siRNAcan also comprise a 3′ overhang. As used herein, a 3′ overhang refers toat least one unpaired nucleotide extending from the 3′-end of a duplexedRNA strand. For example, the siRNA can comprise at least one 3′ overhangof from 1 to about 6 nucleotides (which includes ribonucleotides ordeoxyribonucleotides) in length, or from 1 to about 5 nucleotides inlength, or from 1 to about 4 nucleotides in length, or from about 2 toabout 4 nucleotides in length. For example, each strand of the siRNA cancomprise 3′ overhangs of dithymidylic acid (“TT”) or diuridylic acid(“uu”).

siRNA can be produced chemically or biologically, or can be expressedfrom a recombinant plasmid or viral vector (for example, see U.S. Pat.Nos. 7,294,504 and 7,422,896, the entire disclosures of which are hereinincorporated by reference). Exemplary methods for producing and testingdsRNA or siRNA molecules are described in U.S. Patent ApplicationPublication No. 2002/0173478 to Gewirtz, U.S. Pat. No. 8,071,559 toHannon et al., and in U.S. Pat. No. 7,148,342 to Tolentino et al., theentire disclosures of which are herein incorporated by reference.

In one embodiment, an siRNA directed to a human nucleic acid sequencecomprising an EGFR fusion molecule can be generated against any one ofSEQ ID NOS: 2, 4, 8, 10, 14, or 15. In another embodiment, an siRNAdirected to a human nucleic acid sequence comprising a breakpoint of anEGFR fusion molecule can be generated against any one of SEQ ID NOS: 4,10, or 15.

RNA polymerase III transcribed DNAs contain promoters, such as the U6promoter. These DNAs can be transcribed to produce small hairpin RNAs inthe cell that can function as siRNA or linear RNAs, which can functionas antisense RNA. The EGFR fusion molecule inhibitor can compriseribonucleotides, deoxyribonucleotides, synthetic nucleotides, or anysuitable combination such that the target RNA and/or gene is inhibited.In addition, these forms of nucleic acid can be single, double, triple,or quadruple stranded. (See for example Bass (2001) Nature, 411:428-429;Elbashir et al., (2001) Nature, 411:494 498; U.S. Pat. No. 6,509,154;U.S. Patent Application Publication No. 2003/0027783; and PCTPublication Nos. WO 00/044895, WO 99/032619, WO 00/01846, WO 01/029058,WO 00/044914).

EGFR fusion molecule inhibitor can be a small molecule that binds to anEGFR fusion protein described herein and disrupts its function. Smallmolecules are a diverse group of synthetic and natural substancesgenerally having low molecular weights. They can be isolated fromnatural sources (for example, plants, fungi, microbes and the like), areobtained commercially and/or available as libraries or collections, orsynthesized. Candidate small molecules that inhibit an EGFR fusionprotein can be identified via in silico screening or high-through-put(HTP) screening of combinatorial libraries according to methodsestablished in the art (e.g., see Potyrailo et al., (2011) ACS Comb Sci.13(6):579-633; Mensch et al., (2009) J Pharm Sci. 98(12):4429-68; Schnur(2008) Curr Opin Drug Discov Devel. 11(3):375-80; and Jhoti (2007) ErnstSchering Found Symp Proc. (3):169-85, each of which are herebyincorporated by reference in their entireties.) Most conventionalpharmaceuticals, such as aspirin, penicillin, and manychemotherapeutics, are small molecules, can be obtained commercially,can be chemically synthesized, or can be obtained from random orcombinatorial libraries as described below (see, e.g., Werner et al.,(2006) Brief Funct Genomic Proteomic 5(1):32-6).

Non-limiting examples of EGFR fusion molecule inhibitors include theEGFR inhibitors AZD4547 (see Gavine et al., (2012) Cancer Res, 72(8);2045-56; see also PCT Application Publication No. WO2008/075068, each ofwhich are hereby incorporated by reference in their entireties);NVP-BGJ398 (see Guagnano et al., (2011) J. Med. Chem., 54:7066-7083; seealso U.S. Patent Application Publication No. 2008-0312248 A1, each ofwhich are hereby incorporated by reference in their entireties);PD173074 (see Guagnano et al., (2011) J. Med. Chem., 54:7066-7083; seealso Mohammadi et al., (1998) EMBO J., 17:5896-5904, each of which arehereby incorporated by reference in their entireties); NF449 (EMDMillipore (Billerica, MA) Cat. No. 480420; see also Krejci, (2010) theJournal of Biological Chemistry, 285(27):20644-20653, which is herebyincorporated by reference in its entirety); LY2874455 (Active Biochem;see Zhao et al. (2011) Mol Cancer Ther. (11):2200-10; see also PCTApplication Publication No. WO 2010129509, each of which are herebyincorporated by reference in their entireties); TKI258 (Dovitinib);BIBF-1120 (Intedanib-Vargatef); BMS-582664 (Brivanib alaninate);AZD-2171 (Cediranib); TSU-68 (Orantinib); AB-1010 (Masitinib); AP-24534(Ponatinib); and E-7080 (by Eisai). A non-limiting example of an EGFRfusion molecule inhibitor includes the inhibitor KHS101 (Wurdak et al.,(2010) PNAS, 107(38): 16542-47, which is hereby incorporated byreference in its entirety).

Structures of EGFR fusion molecule inhibitors useful for the inventioninclude, but are not limited to: the EGFR inhibitor AZD4547,

the EGFR inhibitor NVP-BGJ398,

the EGFR inhibitor PD173074,

the EGFR inhibitor LY2874455

and the EGFR inhibitor NF449 (EMD Millipore (Billerica, MA) Cat. No.480420),

Other EGFR inhibitors include, but are not limited to:

A structure of an EGFR fusion molecule inhibitor useful for theinvention includes, but is not limited to the inhibitor KHS101,

Assessment and Therapeutic Treatment

The invention provides a method of decreasing the growth of a solidtumor in a subject. The tumor is associated with, but not limited toglioblastoma multiforme, breast cancer, lung cancer, prostate cancer, orcolorectal carcinoma. In one embodiment, the method comprises detectingthe presence of an EGFR fusion molecule in a sample obtained from asubject. In some embodiments, the sample is incubated with an agent thatbinds to an EFGR fusion molecule, such as an antibody, a probe, anucleic acid primer, and the like. In further embodiments, the methodcomprises administering to the subject an effective amount of an EGFRfusion molecule inhibitor, wherein the inhibitor decreases the size ofthe solid tumor.

The invention also provides a method for treating or preventing agene-fusion associated cancer in a subject, such as, but not limited to,glioblastoma multiforme, breast cancer, lung cancer, prostate cancer, orcolorectal carcinoma. In one embodiment, the method comprises detectingthe presence of an EGFR fusion molecule in a sample obtained from asubject, the presence of the fusion being indicative of a gene-fusionassociated cancer, and, administering to the subject in need atherapeutic treatment against a gene-fusion associated cancer. In someembodiments, the sample is incubated with an agent that binds to an EFGRfusion molecule, such as an antibody, a probe, a nucleic acid primer,and the like.

The invention also provides a method for decreasing in a subject in needthereof the expression level or activity of a fusion protein comprisingthe tyrosine kinase domain of an EGFR protein fused to a polypeptidethat constitutively activates the tyrosine kinase domain of the EGFRprotein. In some embodiments, the method comprises obtaining abiological sample from the subject. In some embodiments, the sample isincubated with an agent that binds to an EGFR fusion molecule, such asan antibody, a probe, a nucleic acid primer, and the like. In someembodiments, the method comprises administering to the subject atherapeutic amount of a composition comprising an admixture of apharmaceutically acceptable carrier an inhibitor of the fusion proteinof the invention. In another embodiment, the method further comprisesdetermining the fusion protein expression level or activity. In anotherembodiment, the method further comprises detecting whether the fusionprotein expression level or activity is decreased as compared to thefusion protein expression level or activity prior to administration ofthe composition, thereby decreasing the expression level or activity ofthe fusion protein. In some embodiments, the fusion protein is anEGFR-PSPH fusion protein, an EGFR-CAND fusion protein, or an EGFR-SEPTfusion protein.

The administering step in each of the claimed methods can comprise adrug administration, such as EGFR fusion molecule inhibitor (forexample, a pharmaceutical composition comprising an antibody thatspecifically binds to an EGFR-SEPT fusion protein, an EGFR-PSPH fusionprotein, an EGFR-CAND fusion protein, or a fragment thereof; a smallmolecule that specifically binds to an EGFR protein; an antisense RNA orantisense DNA that decreases expression of an EGFR-SEPT fusion protein,an EGFR-PSPH fusion protein, an EGFR-CAND fusion; a siRNA thatspecifically targets an EGFR-SEPT fusion gene, an EGFR-PSPH fusion gene,or an EGFR-CAND). In one embodiment, the therapeutic molecule to beadministered comprises a polypeptide of an EGFR fusion molecule,comprising at least about 75%, at least about 80%, at least about 85%,at least about 90%, at least about 93%, at least about 95%, at leastabout 97%, at least about 98%, at least about 99%, or 100% of the aminoacid sequence of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 16, or 8495, andexhibits the function of decreasing expression of such a protein, thustreating a gene fusion-associated cancer. In another embodiment,administration of the therapeutic molecule decreases the size of thesolid tumor associated with glioblastoma multiforme, breast cancer, lungcancer, prostate cancer, or colorectal carcinoma. In a furtherembodiment, administration of the therapeutic molecule decreases cellproliferation in a subject afflicted with a gene-fusion associatedcancer.

In another embodiment, the therapeutic molecule to be administeredcomprises an siRNA directed to a human nucleic acid sequence comprisingan EGFR fusion molecule. In one embodiment, the siRNA is directed to anyone of SEQ ID NOS: 2, 4, 8, 10, 14, or 15. In a further embodiment, thetherapeutic molecule to be administered comprises an antibody or bindingfragment thereof, that is directed against SEQ ID NOS: 1, 3, 5, 7, 9,11, 13, 16, or 8495. In some embodiments, the therapeutic molecule to beadministered comprises a small molecule that specifically binds to anEGFR protein, such as AZD4547, NVP-BGJ398, PD173074, NF449, TK1258,BIBF-1120, BMS-582664, AZD-2171, TSU68, AB1010, AP24534, E-7080, orLY2874455.

An EGFR fusion molecule, for example, a fusion between EGFR and SEPT,PSPH, or CAND, can be determined at the level of the DNA, RNA, orpolypeptide. Optionally, detection can be determined by performing anoligonucleotide ligation assay, a confirmation based assay, ahybridization assay, a sequencing assay, an allele-specificamplification assay, a microsequencing assay, a melting curve analysis,a denaturing high performance liquid chromatography (DHPLC) assay (forexample, see Jones et al, (2000) Hum Genet., 106(6):663-8), or acombination thereof. In one embodiment, the detection is performed bysequencing all or part of an EGFR fusion molecule (e.g., EGFR-SEPTfusion (such as an EGFR-SEPT14 fusion), EGFR-CAND fusion (such as anEGFR-CAND1 fusion), EGFR-PSPH), or by selective hybridization oramplification of all or part of an EGFR fusion molecule (e.g., EGFR-SEPTfusion (such as an EGFR-SEPT14 fusion), EGFR-CAND fusion (such as anEGFR-CAND1 fusion), EGFR-PSPH)). AN EGFR fusion molecule specificamplification (e.g., EGFR-SEPT (such as an EGFR-SEPT14), EGFR-CAND (suchas an EGFR-CAND1), EGFR-PSPH nucleic acid specific amplification) can becarried out before the fusion identification step.

The invention provides for a method of detecting a chromosomalalteration in a subject afflicted with a gene-fusion associated cancer.In some embodiments, the gene-fusion associated cancer comprisesglioblastoma multiforme, breast cancer, lung cancer, prostate cancer, orcolorectal carcinoma. In one embodiment, the chromosomal alteration isan in-frame fused transcript described herein, for example an EGFRfusion molecule. An alteration in a chromosome region occupied by anEGFR fusion molecule such as a nucleic acid encoding an EGFR-SEPT fusion(such as an EGFR-SEPT14 fusion), an EGFR-CAND fusion (such as anEGFR-CAND1 fusion), or an EGFR-PSPH, can be any form of mutation(s),deletion(s), rearrangement(s) and/or insertions in the coding and/ornon-coding region of the locus, alone or in various combination(s).Mutations can include point mutations. Insertions can encompass theaddition of one or several residues in a coding or non-coding portion ofthe gene locus. Insertions can comprise an addition of between 1 and 50base pairs in the gene locus. Deletions can encompass any region of one,two or more residues in a coding or non-coding portion of the genelocus, such as from two residues up to the entire gene or locus.Deletions can affect smaller regions, such as domains (introns) orrepeated sequences or fragments of less than about 50 consecutive basepairs, although larger deletions can occur as well. Rearrangementincludes inversion of sequences. The alteration in a chromosome regionoccupied by an EGFR fusion molecule, e.g., a nucleic acid encoding a anEGFR-SEPT fusion (such as an EGFR-SEPT14 fusion), an EGFR-CAND fusion(such as an EGFR-CAND1 fusion), or an EGFR-PSPH, can result in aminoacid substitutions, RNA splicing or processing, product instability, thecreation of stop codons, production of oncogenic fusion proteins,frame-shift mutations, and/or truncated polypeptide production. Thealteration can result in the production of an EGFR fusion molecule, forexample, a nucleic acid encoding an EGFR-SEPT fusion (such as anEGFR-SEPT14 fusion), an EGFR-CAND fusion (such as an EGFR-CAND1 fusion),or an EGFR-PSPH fusion, with altered function, stability, targeting orstructure. The alteration can also cause a reduction, or even anincrease in protein expression. In one embodiment, the alteration in thechromosome region occupied by an EGFR fusion molecule can comprise achromosomal rearrangement resulting in the production of an EGFR fusionmolecule, such as an EGFR-SEPT fusion (such as an EGFR-SEPT14 fusion),an EGFR-CAND fusion (such as an EGFR-CAND1 fusion), or an EGFR-PSPHfusion. This alteration can be determined at the level of the DNA, RNA,or polypeptide. In another embodiment, the detection or determinationcomprises nucleic acid sequencing, selective hybridization, selectiveamplification, gene expression analysis, or a combination thereof. Inanother embodiment, the detection or determination comprises proteinexpression analysis, for example by western blot analysis, ELISA, orother antibody detection methods.

The present invention provides a method for treating a gene-fusionassociated cancer in a subject in need thereof. In one embodiment, themethod comprises obtaining a sample from the subject to determine thelevel of expression of an EGFR fusion molecule in the subject. In someembodiments, the sample is incubated with an agent that binds to an EGFRfusion molecule, such as an antibody, a probe, a nucleic acid primer,and the like. In another embodiment, the detection or determinationcomprises nucleic acid sequencing, selective hybridization, selectiveamplification, gene expression analysis, or a combination thereof. Inanother embodiment, the detection or determination comprises proteinexpression analysis, for example by western blot analysis, ELISA, orother antibody detection methods. In some embodiments, the methodfurther comprises assessing whether to administer an EGFR fusionmolecule inhibitor based on the expression pattern of the subject. Infurther embodiments, the method comprises administering an EGFR fusionmolecule inhibitor to the subject. In one embodiment, the gene-fusionassociated cancer comprises glioblastoma multiforme, breast cancer, lungcancer, prostate cancer, or colorectal carcinoma.

In one embodiment, the invention provides for a method of detecting thepresence of altered RNA expression of an EFGR fusion molecule in asubject, for example, one afflicted with a gene-fusion associatedcancer. In another embodiment, the invention provides for a method ofdetecting the presence of an EGFR fusion molecule in a subject. In someembodiments, the method comprises obtaining a sample from the subject todetermine whether the subject expresses an EGFR fusion molecule. In someembodiments, the sample is incubated with an agent that binds to an EGFRfusion molecule, such as an antibody, a probe, a nucleic acid primer,and the like. In other embodiments, the detection or determinationcomprises nucleic acid sequencing, selective hybridization, selectiveamplification, gene expression analysis, or a combination thereof. Inanother embodiment, the detection or determination comprises proteinexpression analysis, for example by western blot analysis, ELISA, orother antibody detection methods. In some embodiments, the methodfurther comprises assessing whether to administer an EGFR fusionmolecule inhibitor based on the expression pattern of the subject. Infurther embodiments, the method comprises administering an EGFR fusionmolecule inhibitor to the subject. Altered RNA expression includes thepresence of an altered RNA sequence, the presence of an altered RNAsplicing or processing, or the presence of an altered quantity of RNA.These can be detected by various techniques known in the art, includingsequencing all or part of the RNA or by selective hybridization orselective amplification of all or part of the RNA.

In a further embodiment, the method can comprise detecting the presenceor expression of an EGFR fusion molecule, such as a nucleic acidencoding an EGFR-SEPT fusion (such as an EGFR-SEPT14 fusion), anEGFR-CAND fusion (such as an EGFR-CAND1 fusion), or an EGFR-PSPH fusion.Altered polypeptide expression includes the presence of an alteredpolypeptide sequence, the presence of an altered quantity ofpolypeptide, or the presence of an altered tissue distribution. Thesecan be detected by various techniques known in the art, including bysequencing and/or binding to specific ligands (such as antibodies). Inone embodiment, the detecting comprises using a northern blot; real timePCR and primers directed to SEQ ID NOS: 2, 4, 8, 10, 14, or 15; aribonuclease protection assay; a hybridization, amplification, orsequencing technique to detect an EGFR fusion molecule, such as onecomprising SEQ ID NOS: 2, 4, 8, 10, 14, or 15; or a combination thereof.In another embodiment, the PCR primers comprise SEQ ID NOS 32, 33, 34,35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 87, 88, or 89. In a furtherembodiment, primers used for the screening of EGFR fusion molecules,comprise SEQ ID NOS 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,45, 87, 88, or 89. In some embodiments, primers used for genomicdetection of an EGFR fusion comprise SEQ ID NOS 40, 41, 42, 43, 44, 45,or 89.

Various techniques known in the art can be used to detect or quantifyaltered gene or RNA expression or nucleic acid sequences, which include,but are not limited to, hybridization, sequencing, amplification, and/orbinding to specific ligands (such as antibodies). Other suitable methodsinclude allele-specific oligonucleotide (ASO), oligonucleotide ligation,allele-specific amplification, Southern blot (for DNAs), Northern blot(for RNAs), single-stranded conformation analysis (SSCA), PFGE,fluorescent in situ hybridization (FISH), gel migration, clampeddenaturing gel electrophoresis, denaturing HLPC, melting curve analysis,heteroduplex analysis, RNase protection, chemical or enzymatic mismatchcleavage, ELISA, radio-immunoassays (RIA) and immuno-enzymatic assays(IEMA).

Some of these approaches (such as SSCA and constant gradient gelelectrophoresis (CGGE)) are based on a change in electrophoreticmobility of the nucleic acids, as a result of the presence of an alteredsequence. According to these techniques, the altered sequence isvisualized by a shift in mobility on gels. The fragments can then besequenced to confirm the alteration. Some other approaches are based onspecific hybridization between nucleic acids from the subject and aprobe specific for wild type or altered gene or RNA. The probe can be insuspension or immobilized on a substrate. The probe can be labeled tofacilitate detection of hybrids. Some of these approaches are suited forassessing a polypeptide sequence or expression level, such as Northernblot, ELISA and RIA. These latter require the use of a ligand specificfor the polypeptide, for example, the use of a specific antibody.

Hybridization. Hybridization detection methods are based on theformation of specific hybrids between complementary nucleic acidsequences that serve to detect nucleic acid sequence alteration(s). Adetection technique involves the use of a nucleic acid probe specificfor a wild type or altered gene or RNA, followed by the detection of thepresence of a hybrid. The probe can be in suspension or immobilized on asubstrate or support (for example, as in nucleic acid array or chipstechnologies). The probe can be labeled to facilitate detection ofhybrids. In one embodiment, the probe according to the invention cancomprise a nucleic acid directed to SEQ ID NOS: 2, 4, 8, 10, 14, or 15.For example, a sample from the subject can be contacted with a nucleicacid probe specific for a gene encoding an EGFR fusion molecule, and theformation of a hybrid can be subsequently assessed. In one embodiment,the method comprises contacting simultaneously the sample with a set ofprobes that are specific for an EGFR fusion molecule. Also, varioussamples from various subjects can be investigated in parallel.

According to the invention, a probe can be a polynucleotide sequencewhich is complementary to and specifically hybridizes with a, or atarget portion of a, gene or RNA corresponding to an EGFR fusionmolecule. Useful probes are those that are complementary to the gene,RNA, or target portion thereof. Probes can comprise single-strandednucleic acids of between 8 to 1000 nucleotides in length, for instancebetween 10 and 800, between 15 and 700, or between 20 and 500. Longerprobes can be used as well. A useful probe of the invention is a singlestranded nucleic acid molecule of between 8 to 500 nucleotides inlength, which can specifically hybridize to a region of a gene or RNAthat corresponds to an EGFR fusion molecule.

The sequence of the probes can be derived from the sequences of the EGFRfusion genes provided herein. Nucleotide substitutions can be performed,as well as chemical modifications of the probe. Such chemicalmodifications can be accomplished to increase the stability of hybrids(e.g., intercalating groups) or to label the probe. Some examples oflabels include, without limitation, radioactivity, fluorescence,luminescence, and enzymatic labeling.

A guide to the hybridization of nucleic acids is found in e.g.,Sambrook, ed., Molecular Cloning: A Laboratory Manual (3^(rd) Ed.),Vols. 1-3, Cold Spring Harbor Laboratory, 1989; Current Protocols InMolecular Biology, Ausubel, ed. John Wiley & Sons, Inc., New York, 2001;Laboratory Techniques In Biochemistry And Molecular Biology:Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic AcidPreparation, Tijssen, ed. Elsevier, N.Y., 1993.

Sequencing. Sequencing can be carried out using techniques well known inthe art, using automatic sequencers. The sequencing can be performed onthe complete EGFR fusion molecule or on specific domains thereof.

Amplification. Amplification is based on the formation of specifichybrids between complementary nucleic acid sequences that serve toinitiate nucleic acid reproduction. Amplification can be performedaccording to various techniques known in the art, such as by polymerasechain reaction (PCR), ligase chain reaction (LCR), strand displacementamplification (SDA) and nucleic acid sequence based amplification(NASBA). These techniques can be performed using commercially availablereagents and protocols. Useful techniques in the art encompass real-timePCR, allele-specific PCR, or PCR based single-strand conformationalpolymorphism (SSCP). Amplification usually requires the use of specificnucleic acid primers, to initiate the reaction. For example, nucleicacid primers useful for amplifying sequences corresponding to an EGFRfusion molecule are able to specifically hybridize with a portion of thegene locus that flanks a target region of the locus. In one embodiment,amplification comprises using forward and reverse PCR primers directedto SEQ ID NOS: 2, 4, 8, 10, 14, or 15. Nucleic acid primers useful foramplifying sequences from an EGFR fusion molecule; the primersspecifically hybridize with a portion of an EGFR fusion molecule. Incertain subjects, the presence of an EGFR fusion molecule corresponds toa subject with a gene fusion-associated cancer. In one embodiment,amplification can comprise using forward and reverse PCR primerscomprising nucleotide sequences of SEQ ID NOS: 32, 33, 34, 35, 36, 37,38, 39, 40, 41, 42, 43, 44, 45, 87, 88, or 89.

Non-limiting amplification methods include, e.g., polymerase chainreaction, PCR (PCR Protocols, A Guide To Methods And Applications, ed.Innis, Academic Press, N.Y., 1990 and PCR Strategies, 1995, ed. Innis,Academic Press, Inc., N.Y.); ligase chain reaction (LCR) (Wu (1989)Genomics 4:560; Landegren (1988) Science 241:1077; Barringer (1990) Gene89:117); transcription amplification (Kwoh (1989) PNAS 86:1173); and,self-sustained sequence replication (Guatelli (1990) PNAS 87:1874); QBeta replicase amplification (Smith (1997) J. Clin. Microbiol.35:1477-1491), automated Q-beta replicase amplification assay (Burg(1996) Mol. Cell. Probes 10:257-271) and other RNA polymerase mediatedtechniques (e.g., NASBA, Cangene, Mississauga, Ontario; see also Berger(1987) Methods Enzymol. 152:307-316; U.S. Pat. Nos. 4,683,195 and4,683,202; and Sooknanan (1995) Biotechnology 13:563-564). All thereferences stated above are incorporated by reference in theirentireties.

The invention provides for a nucleic acid primer, wherein the primer canbe complementary to and hybridize specifically to a portion of an EGFRfusion molecule, such as a nucleic acid (e.g., DNA or RNA), in certainsubjects having a gene fusion-associated cancer. In one embodiment, thegene-fusion associated cancer comprises glioblastoma multiforme, breastcancer, lung cancer, prostate cancer, or colorectal carcinoma. Primersof the invention can be specific for fusion sequences in a nucleic acid(DNA or RNA) encoding an EGFR-SEPT fusion (such as an EGFR-SEPT14fusion), an EGFR-CAND fusion (such as an EGFR-CAND1 fusion), or anEGFR-PSPH fusion. By using such primers, the detection of anamplification product indicates the presence of a fusion of a nucleicacid encoding an EGFR-SEPT fusion (such as an EGFR-SEPT14 fusion), anEGFR-CAND fusion (such as an EGFR-CAND1 fusion), or an EGFR-PSPH fusion.Examples of primers of this invention can be single-stranded nucleicacid molecules of about 5 to 60 nucleotides in length, or about 8 toabout 25 nucleotides in length. The sequence can be derived directlyfrom the sequence of an EGFR fusion molecule, e.g. a nucleic acidencoding an EGFR-SEPT fusion (such as an EGFR-SEPT14 fusion), anEGFR-CAND fusion (such as an EGFR-CAND1 fusion), or an EGFR-PSPH fusion.Perfect complementarity is useful to ensure high specificity; however,certain mismatch can be tolerated. For example, a nucleic acid primer ora pair of nucleic acid primers as described above can be used in amethod for detecting the presence of a gene fusion-associated cancer ina subject. In one embodiment, primers can be used to detect an EGFRfusion molecule, such as a primer comprising SEQ ID NOS: 32, 33, 34, 35,36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 87, 88, 89; or a combinationthereof.

Specific Ligand Binding. As discussed herein, a nucleic acid encoding anEGFR fusion molecule or expression of an EGFR fusion molecule, can alsobe detected by screening for alteration(s) in a sequence or expressionlevel of a polypeptide encoded by the same. Different types of ligandscan be used, such as specific antibodies. In one embodiment, the sampleis contacted with an antibody specific for a polypeptide encoded by anEGFR fusion molecule and the formation of an immune complex issubsequently determined. Various methods for detecting an immune complexcan be used, such as ELISA, radioimmunoassays (RIA) and immuno-enzymaticassays (IEMA).

For example, an antibody can be a polyclonal antibody, a monoclonalantibody, as well as fragments or derivatives thereof havingsubstantially the same antigen specificity. Fragments include Fab,Fab′2, or CDR regions. Derivatives include single-chain antibodies,humanized antibodies, or poly-functional antibodies. An antibodyspecific for a polypeptide encoded by an EGFR fusion molecule can be anantibody that selectively binds such a polypeptide. In one embodiment,the antibody is raised against a polypeptide encoded by an EGFR fusionmolecule or an epitope-containing fragment thereof. Althoughnon-specific binding towards other antigens can occur, binding to thetarget polypeptide occurs with a higher affinity and can be reliablydiscriminated from non-specific binding. In one embodiment, the methodcan comprise contacting a sample from the subject with an antibodyspecific for an EGFR fusion molecule, and determining the presence of animmune complex. Optionally, the sample can be contacted to a supportcoated with antibody specific for an EGFR fusion molecule. In oneembodiment, the sample can be contacted simultaneously, or in parallel,or sequentially, with various antibodies specific for different forms ofan EGFR fusion molecule, e.g., EGFR-SEPT fusion (such as an EGFR-SEPT14fusion), an EGFR-CAND fusion (such as an EGFR-CAND1 fusion), or anEGFR-PSPH fusion.

The invention also provides for a diagnostic kit comprising products andreagents for detecting in a sample from a subject the presence of anEGFR fusion molecule. The kit can be useful for determining whether asample from a subject exhibits reduced expression of an EGFR fusionmolecule. For example, the diagnostic kit according to the presentinvention comprises any primer, any pair of primers, any nucleic acidprobe and/or any ligand, or any antibody directed specifically to anEGFR fusion molecule. The diagnostic kit according to the presentinvention can further comprise reagents and/or protocols for performinga hybridization, amplification, or antigen-antibody immune reaction. Inone embodiment, the kit can comprise nucleic acid primers thatspecifically hybridize to and can prime a polymerase reaction from anEGFR fusion molecule comprising SEQ ID NOS: 32, 33, 34, 35, 36, 37, 38,39, 40, 41, 42, 43, 44, 45, 87, 88, 89, or a combination thereof. In oneembodiment, primers can be used to detect an EGFR fusion molecule, suchas a primer comprising SEQ ID NOS: 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42, 43, 44, 45, 87, 88, 89; or a combination thereof. In a furtherembodiment, primers used for the screening of EGFR fusion molecules.

The diagnosis methods can be performed in vitro, ex vivo, or in vivo.These methods utilize a sample from the subject in order to assess thestatus of an EGFR fusion molecule. The sample can be any biologicalsample derived from a subject, which contains nucleic acids orpolypeptides. Examples of such samples include, but are not limited to,fluids, tissues, cell samples, organs, and tissue biopsies. Non-limitingexamples of samples include blood, liver, plasma, serum, saliva, urine,or seminal fluid. The sample can be collected according to conventionaltechniques and used directly for diagnosis or stored. The sample can betreated prior to performing the method, in order to render or improveavailability of nucleic acids or polypeptides for testing. Treatmentsinclude, for instance, lysis (e.g., mechanical, physical, or chemical),centrifugation. The nucleic acids and/or polypeptides can bepre-purified or enriched by conventional techniques, and/or reduced incomplexity. Nucleic acids and polypeptides can also be treated withenzymes or other chemical or physical treatments to produce fragmentsthereof. In one embodiment, the sample is contacted with reagents, suchas probes, primers, or ligands, in order to assess the presence of anEGFR fusion molecule. Contacting can be performed in any suitabledevice, such as a plate, tube, well, or glass. In some embodiments, thecontacting is performed on a substrate coated with the reagent, such asa nucleic acid array or a specific ligand array. The substrate can be asolid or semi-solid substrate such as any support comprising glass,plastic, nylon, paper, metal, or polymers. The substrate can be ofvarious forms and sizes, such as a slide, a membrane, a bead, a column,or a gel. The contacting can be made under any condition suitable for acomplex to be formed between the reagent and the nucleic acids orpolypeptides of the sample.

Nucleic Acid Delivery Methods

Delivery of nucleic acids into viable cells can be effected ex vivo, insitu, or in vivo by use of vectors, such as viral vectors (e.g.,lentivirus, adenovirus, adeno-associated virus, or a retrovirus), or exvivo by use of physical DNA transfer methods (e.g., liposomes orchemical treatments). Non-limiting techniques suitable for the transferof nucleic acid into mammalian cells in vitro include the use ofliposomes, electroporation, microinjection, cell fusion, DEAE-dextran,and the calcium phosphate precipitation method (See, for example,Anderson, Nature, 1998) supplement to 392(6679):250. Introduction of anucleic acid or a gene encoding a polypeptide of the invention can alsobe accomplished with extrachromosomal substrates (transient expression)or artificial chromosomes (stable expression). Cells can also becultured ex vivo in the presence of therapeutic compositions of thepresent invention in order to proliferate or to produce a desired effecton or activity in such cells. Treated cells can then be introduced invivo for therapeutic purposes.

Nucleic acids can be inserted into vectors and used as gene therapyvectors. A number of viruses have been used as gene transfer vectors,including papovaviruses, e.g., SV40 (Madzak et al., (1992) J Gen Virol.73(Pt 6):1533-6), adenovirus (Berkner (1992) Curr Top Microbiol Immunol.158:39-66; Berkner (1988) Biotechniques, 6(7):616-29; Gorziglia andKapikian (1992) J Virol. 66(7):4407-12; Quantin et al., (1992) Proc NatlAcad Sci USA. 89(7):2581-4; Rosenfeld et al., (1992) Cell. 68(1):143-55;Wilkinson et al., (1992) Nucleic Acids Res. 20(9):2233-9;Stratford-Perricaudet et al., (1990) Hum Gene Ther. 1(3):241-56),vaccinia virus (Moss (1992) Curr Opin Biotechnol. 3(5):518-22),adeno-associated virus (Muzyczka, (1992) Curr Top Microbiol Immunol.158:97-129; Ohi et al., (1990) Gene. 89(2):279-82), herpesvirusesincluding HSV and EBV (Margolskee (1992) Curr Top Microbiol Immunol.158:67-95; Johnson et al., (1992) Brain Res Mol Brain Res.12(1-3):95-102; Fink et al., (1992) Hum Gene Ther. 3(1):11-9;Breakefield and Geller (1987) Mol Neurobiol. 1(4):339-71; Freese et al.,(1990) Biochem Pharmacol. 40(10):2189-99), and retroviruses of avian(Bandyopadhyay and Temin (1984) Mol Cell Biol. 4(4):749-54; Petropouloset al., (1992) J Virol. 66(6):3391-7), murine (Miller et al. (1992) MolCell Biol. 12(7):3262-72; Miller et al., (1985) J Virol. 55(3):521-6;Sorge et al., (1984) Mol Cell Biol. 4(9):1730-7; Mann and Baltimore(1985) J Virol. 54(2):401-7; Miller et al., (1988) J Virol.62(11):4337-45), and human origin (Shimada et al., (1991)J Clin Invest.88(3):1043-7; Helseth et al., (1990) J Virol. 64(12):6314-8; Page etal., (1990) J Virol. 64(11):5270-6; Buchschacher and Panganiban (1992) JVirol. 66(5):2731-9).

Non-limiting examples of in vivo gene transfer techniques includetransfection with viral (e.g., retroviral) vectors (see U.S. Pat. No.5,252,479, which is incorporated by reference in its entirety) and viralcoat protein-liposome mediated transfection (Dzau et al., (1993) Trendsin Biotechnology 11:205-210), incorporated entirely by reference). Forexample, naked DNA vaccines are generally known in the art; see Brower,(1998) Nature Biotechnology, 16:1304-1305, which is incorporated byreference in its entirety. Gene therapy vectors can be delivered to asubject by, for example, intravenous injection, local administration(see, e.g., U.S. Pat. No. 5,328,470) or by stereotactic injection (see,e.g., Chen, et al., (1994) Proc. Natl. Acad. Sci. USA 91:3054-3057). Thepharmaceutical preparation of the gene therapy vector can include thegene therapy vector in an acceptable diluent, or can comprise a slowrelease matrix in which the gene delivery vehicle is imbedded.Alternatively, where the complete gene delivery vector can be producedintact from recombinant cells, e.g., retroviral vectors, thepharmaceutical preparation can include one or more cells that producethe gene delivery system.

For reviews of nucleic acid delivery protocols and methods see Andersonet al. (1992) Science 256:808-813; U.S. Pat. Nos. 5,252,479, 5,747,469,6,017,524, 6,143,290, 6,410,010 6,511,847; and U.S. ApplicationPublication No. 2002/0077313, which are all hereby incorporated byreference in their entireties. For additional reviews, see Friedmann(1989) Science, 244:1275-1281; Verma, Scientific American: 68-84 (1990);Miller (1992) Nature, 357: 455-460; Kikuchi et al. (2008) J DermatolSci. 50(2):87-98; Isaka et al. (2007) Expert Opin Drug Deliv.4(5):561-71; Jager et al. (2007) Curr Gene Ther. 7(4):272-83; Waehler etal. (2007) Nat Rev Genet. 8(8):573-87; Jensen et al. (2007) Ann Med.39(2):108-15; Herweijer et al. (2007) Gene Ther. 14(2):99-107; Eliyahuet al. (2005) Molecules 10(1):34-64; and Altaras et al. (2005) AdvBiochem Eng Biotechnol. 99:193-260, all of which are hereby incorporatedby reference in their entireties.

An EGFR fusion nucleic acid can also be delivered in a controlledrelease system. For example, the EGFR fusion molecule can beadministered using intravenous infusion, an implantable osmotic pump, atransdermal patch, liposomes, or other modes of administration. In oneembodiment, a pump can be used (see Sefton (1987) Biomed. Eng. 14:201;Buchwald et al. (1980) Surgery 88:507; Saudek et al. (1989) N. Engl. J.Med. 321:574). In another embodiment, polymeric materials can be used(see Medical Applications of Controlled Release, Langer and Wise (eds.),CRC Pres., Boca Raton, Fla. (1974); Controlled Drug Bioavailability,Drug Product Design and Performance, Smolen and Ball (eds.), Wiley, NewYork (1984); Ranger and Peppas, (1983) J. Macromol. Sci. Rev. Macromol.Chem. 23:61; see also Levy et al. (1985) Science 228:190; During et al.(1989) Ann. Neurol. 25:351; Howard et al. (1989) J. Neurosurg. 71:105).In yet another embodiment, a controlled release system can be placed inproximity of the therapeutic target thus requiring only a fraction ofthe systemic dose (see, e.g., Goodson, in Medical Applications ofControlled Release, supra, vol. 2, pp. 115-138 (1984)). Other controlledrelease systems are discussed in the review by Langer (Science (1990)249:1527-1533).

Pharmaceutical Compositions and Administration for Therapy

An inhibitor of the invention can be incorporated into pharmaceuticalcompositions suitable for administration, for example the inhibitor anda pharmaceutically acceptable carrier

AN EGFR fusion molecule or inhibitor of the invention can beadministered to the subject once (e.g., as a single injection ordeposition). Alternatively, an EGFR fusion molecule or inhibitor can beadministered once or twice daily to a subject in need thereof for aperiod of from about two to about twenty-eight days, or from about sevento about ten days. AN EGFR fusion molecule or inhibitor can also beadministered once or twice daily to a subject for a period of 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12 times per year, or a combination thereof.Furthermore, an EGFR fusion molecule or inhibitor of the invention canbe co-administrated with another therapeutic. Where a dosage regimencomprises multiple administrations, the effective amount of the EGFRfusion molecule or inhibitor administered to the subject can comprisethe total amount of gene product administered over the entire dosageregimen.

AN EGFR fusion molecule or inhibitor can be administered to a subject byany means suitable for delivering the EGFR fusion molecule or inhibitorto cells of the subject, such as cancer cells, e.g., glioblastomamultiforme, breast cancer, lung cancer, prostate cancer, or colorectalcarcinoma. For example, an EGFR fusion molecule or inhibitor can beadministered by methods suitable to transfect cells. Transfectionmethods for eukaryotic cells are well known in the art, and includedirect injection of the nucleic acid into the nucleus or pronucleus of acell; electroporation; liposome transfer or transfer mediated bylipophilic materials; receptor mediated nucleic acid delivery,bioballistic or particle acceleration; calcium phosphate precipitation,and transfection mediated by viral vectors.

The compositions of this invention can be formulated and administered toreduce the symptoms associated with a gene fusion-associated cancer,e.g., glioblastoma multiforme, breast cancer, lung cancer, prostatecancer, or colorectal carcinoma, by any means that produces contact ofthe active ingredient with the agent's site of action in the body of asubject, such as a human or animal (e.g., a dog, cat, or horse). Theycan be administered by any conventional means available for use inconjunction with pharmaceuticals, either as individual therapeuticactive ingredients or in a combination of therapeutic activeingredients. They can be administered alone, but are generallyadministered with a pharmaceutical carrier selected on the basis of thechosen route of administration and standard pharmaceutical practice.

A therapeutically effective dose of EGFR fusion molecule or inhibitorcan depend upon a number of factors known to those or ordinary skill inthe art. The dose(s) of the EGFR fusion molecule inhibitor can vary, forexample, depending upon the identity, size, and condition of the subjector sample being treated, further depending upon the route by which thecomposition is to be administered, if applicable, and the effect whichthe practitioner desires the an EGFR fusion molecule inhibitor to haveupon the nucleic acid or polypeptide of the invention. These amounts canbe readily determined by a skilled artisan. Any of the therapeuticapplications described herein can be applied to any subject in need ofsuch therapy, including, for example, a mammal such as a dog, a cat, acow, a horse, a rabbit, a monkey, a pig, a sheep, a goat, or a human.

Pharmaceutical compositions for use in accordance with the invention canbe formulated in conventional manner using one or more physiologicallyacceptable carriers or excipients. The therapeutic compositions of theinvention can be formulated for a variety of routes of administration,including systemic and topical or localized administration. Techniquesand formulations generally can be found in Remington's PharmaceuticalSciences, Meade Publishing Co., Easton, Pa (20^(th) Ed., 2000), theentire disclosure of which is herein incorporated by reference. Forsystemic administration, an injection is useful, includingintramuscular, intravenous, intraperitoneal, and subcutaneous. Forinjection, the therapeutic compositions of the invention can beformulated in liquid solutions, for example in physiologicallycompatible buffers such as Hank's solution or Ringer's solution. Inaddition, the therapeutic compositions can be formulated in solid formand redissolved or suspended immediately prior to use. Lyophilized formsare also included. Pharmaceutical compositions of the present inventionare characterized as being at least sterile and pyrogen-free. Thesepharmaceutical formulations include formulations for human andveterinary use.

According to the invention, a pharmaceutically acceptable carrier cancomprise any and all solvents, dispersion media, coatings, antibacterialand antifungal agents, isotonic and absorption delaying agents, and thelike, compatible with pharmaceutical administration. The use of suchmedia and agents for pharmaceutically active substances is well known inthe art. Any conventional media or agent that is compatible with theactive compound can be used. Supplementary active compounds can also beincorporated into the compositions.

A pharmaceutical composition containing EGFR fusion molecule inhibitorcan be administered in conjunction with a pharmaceutically acceptablecarrier, for any of the therapeutic effects discussed herein. Suchpharmaceutical compositions can comprise, for example antibodiesdirected to an EGFR fusion molecule, or a variant thereof, orantagonists of an EGFR fusion molecule. The compositions can beadministered alone or in combination with at least one other agent, suchas a stabilizing compound, which can be administered in any sterile,biocompatible pharmaceutical carrier including, but not limited to,saline, buffered saline, dextrose, and water. The compositions can beadministered to a patient alone, or in combination with other agents,drugs or hormones.

Sterile injectable solutions can be prepared by incorporating the EGFRfusion molecule inhibitor (e.g., a polypeptide or antibody) in therequired amount in an appropriate solvent with one or a combination ofingredients enumerated herein, as required, followed by filteredsterilization. Generally, dispersions are prepared by incorporating theactive compound into a sterile vehicle which contains a basic dispersionmedium and the required other ingredients from those enumerated herein.In the case of sterile powders for the preparation of sterile injectablesolutions, examples of useful preparation methods are vacuum drying andfreeze-drying which yields a powder of the active ingredient plus anyadditional desired ingredient from a previously sterile-filteredsolution thereof.

In some embodiments, the EGFR fusion molecule inhibitor can be appliedvia transdermal delivery systems, which slowly releases the activecompound for percutaneous absorption. Permeation enhancers can be usedto facilitate transdermal penetration of the active factors in theconditioned media. Transdermal patches are described in for example,U.S. Pat. Nos. 5,407,713; 5,352,456; 5,332,213; 5,336,168; 5,290,561;5,254,346; 5,164,189; 5,163,899; 5,088,977; 5,087,240; 5,008,110; and4,921,475.

“Subcutaneous” administration can refer to administration just beneaththe skin (i.e., beneath the dermis). Generally, the subcutaneous tissueis a layer of fat and connective tissue that houses larger blood vesselsand nerves. The size of this layer varies throughout the body and fromperson to person. The interface between the subcutaneous and musclelayers can be encompassed by subcutaneous administration. This mode ofadministration can be feasible where the subcutaneous layer issufficiently thin so that the factors present in the compositions canmigrate or diffuse from the locus of administration. Thus, whereintradermal administration is utilized, the bolus of compositionadministered is localized proximate to the subcutaneous layer.

Administration of the cell aggregates (such as DP or DS aggregates) isnot restricted to a single route, but can encompass administration bymultiple routes. For instance, exemplary administrations by multipleroutes include, among others, a combination of intradermal andintramuscular administration, or intradermal and subcutaneousadministration. Multiple administrations can be sequential orconcurrent. Other modes of application by multiple routes will beapparent to the skilled artisan.

In other embodiments, this implantation method will be a one-timetreatment for some subjects. In further embodiments of the invention,multiple cell therapy implantations will be required. In someembodiments, the cells used for implantation will generally besubject-specific genetically engineered cells. In another embodiment,cells obtained from a different species or another individual of thesame species can be used. Thus, using such cells can requireadministering an immunosuppressant to prevent rejection of the implantedcells. Such methods have also been described in U.S. Pat. No. 7,419,661and PCT application publication WO 2001/32840, and are herebyincorporated by reference.

A pharmaceutical composition of the invention is formulated to becompatible with its intended route of administration. Examples of routesof administration include parenteral, e.g., intravenous, intradermal,subcutaneous, oral (e.g., inhalation or ingestion), transdermal(topical), transmucosal, and rectal administration. Solutions orsuspensions used for parenteral, intradermal, or subcutaneousapplication can include the following components: a sterile diluent suchas water for injection, saline solution, fixed oils, polyethyleneglycols, glycerine, propylene glycol or other synthetic solvents;antibacterial agents such as benzyl alcohol or methyl parabens;antioxidants such as ascorbic acid or sodium bisulfite; chelating agentssuch as ethylenediaminetetraacetic acid; buffers such as acetates,citrates or phosphates and agents for the adjustment of tonicity such assodium chloride or dextrose. pH can be adjusted with acids or bases,such as hydrochloric acid or sodium hydroxide. The parenteralpreparation can be enclosed in ampoules, disposable syringes or multipledose vials made of glass or plastic.

Pharmaceutical compositions suitable for injectable use include sterileaqueous solutions (where water soluble) or dispersions and sterilepowders for the extemporaneous preparation of sterile injectablesolutions or dispersions. For intravenous administration, suitablecarriers include physiological saline, bacteriostatic water, orphosphate buffered saline (PBS). In all cases, the composition must besterile and should be fluid to the extent that easy syringabilityexists. It must be stable under the conditions of manufacture andstorage and must be preserved against the contaminating action ofmicroorganisms such as bacteria and fungi. The carrier can be a solventor dispersion medium containing, for example, water, ethanol, apharmaceutically acceptable polyol like glycerol, propylene glycol,liquid polyetheylene glycol, and suitable mixtures thereof. The properfluidity can be maintained, for example, by the use of a coating such aslecithin, by the maintenance of the required particle size in the caseof dispersion and by the use of surfactants. Prevention of the action ofmicroorganisms can be achieved by various antibacterial and antifungalagents, for example, parabens, chlorobutanol, phenol, ascorbic acid,thimerosal, and the like. In many cases, it can be useful to includeisotonic agents, for example, sugars, polyalcohols such as mannitol,sorbitol, sodium chloride in the composition. Prolonged absorption ofthe injectable compositions can be brought about by including in thecomposition an agent which delays absorption, for example, aluminummonostearate and gelatin.

Sterile injectable solutions can be prepared by incorporating theinhibitor (e.g., a polypeptide or antibody or small molecule) of theinvention in the required amount in an appropriate solvent with one or acombination of ingredients enumerated herein, as required, followed byfiltered sterilization. Generally, dispersions are prepared byincorporating the active compound into a sterile vehicle which containsa basic dispersion medium and the required other ingredients from thoseenumerated herein. In the case of sterile powders for the preparation ofsterile injectable solutions, examples of useful preparation methods arevacuum drying and freeze-drying which yields a powder of the activeingredient plus any additional desired ingredient from a previouslysterile-filtered solution thereof.

Oral compositions generally include an inert diluent or an ediblecarrier. They can be enclosed in gelatin capsules or compressed intotablets. For the purpose of oral therapeutic administration, the activecompound can be incorporated with excipients and used in the form oftablets, troches, or capsules. Oral compositions can also be preparedusing a fluid carrier and subsequently swallowed.

Pharmaceutically compatible binding agents, and/or adjuvant materialscan be included as part of the composition. The tablets, pills,capsules, troches and the like can contain any of the followingingredients, or compounds of a similar nature: a binder such asmicrocrystalline cellulose, gum tragacanth or gelatin; an excipient suchas starch or lactose, a disintegrating agent such as alginic acid,Primogel, or corn starch; a lubricant such as magnesium stearate orsterotes; a glidant such as colloidal silicon dioxide; a sweeteningagent such as sucrose or saccharin; or a flavoring agent such aspeppermint, methyl salicylate, or orange flavoring.

Systemic administration can also be by transmucosal or transdermalmeans. For transmucosal or transdermal administration, penetrantsappropriate to the barrier to be permeated are used in the formulation.Such penetrants are generally known in the art, and include, forexample, for transmucosal administration, detergents, bile salts, andfusidic acid derivatives. Transmucosal administration can beaccomplished through the use of nasal sprays or suppositories. Fortransdermal administration, the active compounds are formulated intoointments, salves, gels, or creams as generally known in the art.

In some embodiments, the effective amount of the administered EGFRfusion molecule inhibitor is at least about 0.0001 μg/kg body weight, atleast about 0.00025 μg/kg body weight, at least about 0.0005 μg/kg bodyweight, at least about 0.00075 μg/kg body weight, at least about 0.001μg/kg body weight, at least about 0.0025 μg/kg body weight, at leastabout 0.005 μg/kg body weight, at least about 0.0075 μg/kg body weight,at least about 0.01 μg/kg body weight, at least about 0.025 μg/kg bodyweight, at least about 0.05 μg/kg body weight, at least about 0.075μg/kg body weight, at least about 0.1 μg/kg body weight, at least about0.25 μg/kg body weight, at least about 0.5 μg/kg body weight, at leastabout 0.75 μg/kg body weight, at least about 1 μg/kg body weight, atleast about 5 μg/kg body weight, at least about 10 μg/kg body weight, atleast about 25 μg/kg body weight, at least about 50 μg/kg body weight,at least about 75 μg/kg body weight, at least about 100 μg/kg bodyweight, at least about 150 μg/kg body weight, at least about 200 μg/kgbody weight, at least about 250 μg/kg body weight, at least about 300μg/kg body weight, at least about 350 μg/kg body weight, at least about400 μg/kg body weight, at least about 450 μg/kg body weight, at leastabout 500 μg/kg body weight, at least about 550 μg/kg body weight, atleast about 600 μg/kg body weight, at least about 650 μg/kg body weight,at least about 700 μg/kg body weight, at least about 750 μg/kg bodyweight, at least about 800 μg/kg body weight, at least about 850 μg/kgbody weight, at least about 900 μg/kg body weight, at least about 950μg/kg body weight, at least about 1000 μg/kg body weight, at least about2000 μg/kg body weight, at least about 3000 μg/kg body weight, at leastabout 4000 μg/kg body weight, at least about 5000 μg/kg body weight, atleast about 6000 μg/kg body weight, at least about 7000 μg/kg bodyweight, at least about 8000 μg/kg body weight, at least about 9500 μg/kgbody weight, or at least about 10,000 μg/kg body weight.

Unless otherwise defined, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Exemplary methods and materialsare described below, although methods and materials similar orequivalent to those described herein can also be used in the practice ortesting of the present invention.

All publications and other references mentioned herein are incorporatedby reference in their entirety, as if each individual publication orreference were specifically and individually indicated to beincorporated by reference. Publications and references cited herein arenot admitted to be prior art.

EXAMPLES

Examples are provided below to facilitate a more complete understandingof the invention. The following examples illustrate the exemplary modesof making and practicing the invention. However, the scope of theinvention is not limited to specific embodiments disclosed in theseExamples, which are for purposes of illustration only, since alternativemethods can be utilized to obtain similar results.

Example 1: The Integrated Landscape of Driver Genomic Alterations inGlioblastoma

To address the challenge of driver mutations in glioblastoma (GBM) anduncover new driver genes in human GBM, a computational platform wasdeveloped that integrates the analysis of copy number variations andsomatic mutations from a whole-exome dataset. The full spectrum ofin-frame gene fusions was unveiled from a large transcriptome dataset ofglioblastoma. The analyses revealed focal copy number variations andmutations in all the genes previously implicated in glioblastomapathogenesis. Recurrent copy number variations and somatic mutationswere detected in 18 genes not yet implicated in glioblastoma. For eachof the new genes, the occurrence of focal and recurrent copy numberchanges in addition to somatic mutations underscores the relevance forglioblastoma pathogenesis. Without being bound by theory, mutations inLZTR-1, a Keltch-BTB-BACK-BTB-BACK adaptor of Cul3-containing E3 ligasecomplexes impacted ubiquitination of LZTR-1 substrates. Loss-of-functionmutations of CTNND2 (coding for δ-catenin) targeted a neural-specificgene and were associated with the transformation of glioma cells alongthe mesenchymal lineage, a hallmark of aggressive glioblastoma.Reconstitution of δ-catenin in mesenchymal glioma cells reprogrammedthem towards a neuronal cell fate. Recurrent translocations were alsoidentified that fuse in-frame the coding sequence of EGFR to severalpartners in 7.6% of tumors, with EGFR-Septin-14 scoring as the mostfrequent functional gene fusion in human glioblastoma. EGFR fusionsenhance proliferation and motility of glioma cells and confersensitivity to EGFR inhibition in glioblastoma xenografts. These resultsprovide important insights into the pathogenesis of glioblastoma andhighlight new targets for therapeutic intervention.

Glioblastoma (GBM) is the most common primary intrinsic malignant braintumor affecting ˜10,000 new patients each year with a median survivalrate of only 12-15 months^(1,2). Identifying and understanding thefunctional significance of the genetic alterations that drive initiationand progression of GBM is crucial to develop more effective therapies.Previous efforts in GBM genome characterization included array-basedprofiling of copy number changes, methylation and gene expression andtargeted sequencing of candidate genes³⁻⁶. These studies identifiedsomatic changes in well-known GBM genes (EGFR, PTEN, IDH1, TP53, NF1,etc.) and nominated putative cancer genes with somatic mutations, butthe functional consequences of most alterations is unknown. The lack ofstrict correlation between somatic alterations and functionality in GBMis manifested by regions of large copy number variations (CNVs), inwhich the relevant gene(s) are masked within genomic domainsencompassing many other genes. Furthermore, although the potential ofnext-generation sequencing of the whole coding exome is widelyrecognized for the nomination of new cancer genes, the elevated somaticmutation rate of GBM is a significant challenge for statisticalapproaches aimed to distinguish genes harboring driver from those withpassenger mutations. A statistical approach was used to nominate drivergenes in GBM from the integration of whole-exome sequencing data callingfor somatic mutations with a CNVs analysis that prioritizes focality andmagnitude of the genetic alterations.

Chromosomal rearrangements resulting in recurrent and oncogenic genefusions are hallmarks of hematological malignancies and recently theyhave also been uncovered in solid tumors (breast, prostate, lung andcolorectal carcinoma)^(7,8). Recently, a small subset of GBM harborFGFR-TACC gene fusions were provided indicating that the patients withFGFR-TACC-positive tumors would benefit from targeted EGFR kinaseinhibition⁹. It remains unknown whether gene fusions involving otherRTK-coding genes exist in GBM to create different oncogene addictingstates. A large RNA-sequencing dataset of primary GBM and glioma stemcells (GSCs) was analyzed and the global landscape of in-frame genefusions in human GBM was reported.

Nomination of Candidate GBM Genes

Focal CNVs and point mutations provide exquisite information oncandidate driver genes by pinpointing their exact location. Withoutbeing bound by theory, the integration of somatic point mutations andfocal CNV information in a single framework will nominate candidategenes implicated in GBM. MutComFocal is an algorithm designed for thispurpose, in which driver genes are ranked by an integrated recurrence,focality and mutation score (see Methods). Overall, this strategy wasapplied to a cohort of 139 GBM and matched normal DNA analyzed by wholeexome sequencing to identify somatic mutations and 469 GBM were analyzedby the Affymetrix SNP6.0 platform to identify CNVs.

The whole-exome analysis identified a mean of 43 protein-changingsomatic mutations per tumor sample. The distribution of substitutionsshows a higher rate of transitions vs tranversions (67%), with a strongpreference for C->T and G->A (55%) (FIG. 6 ). As seen in other tumortypes¹⁰, 19.2% of the mutations occurred in a CpG dinucleotide context(FIG. 7 ). Among somatic small nucleotide variants, the most frequentlymutated genes have well-established roles in cancer, including GBM(TP53, EGFR, PTEN, and IDH1). In addition to known cancer genes,whole-exome sequencing identified several potentially new candidatedriver genes mutated in ˜5% of tumor samples. To uncover the most likelydriver genes of GBM initiation and/or progression, the mutation resultswere integrated with common focal genomic alterations, detected using analgorithm applied to high-density SNP arrays to generate MutComFocalscores. This analysis stratified somatically mutated genes into threegroups: recurrently mutated genes without significant copy numberalterations (Mut), mutated genes in regions of focal and recurrentamplifications (Amp-Mut) and mutated genes in regions of focal andrecurrent deletions (Del-Mut). Employing this framework, a list of 67genes was generated that score at the top of each of the threecategories and that included nearly all the genes previously implicatedin GBM. These genes, which are labeled in green in FIG. 1 include IDH1(Mut, FIG. 1 a ), PIK3C2B, MDM4, MYCN, PIK3CA, PDGFRA, KIT, EGFR, andBRAF (Amp-Mut, FIG. 1B) and PIK3R1, PTEN, RB1, TP53, NF1 and ATRX(Del-Mut, FIG. 1 c ). Interestingly, the analysis also selected 52 newcandidate driver genes previously unreported in GBM. Based upon theirrole in development and homeostasis of the CNS and their potentialfunction in oncogenesis and tumor progression, 24 genes were selectedfor re-sequencing in an independent dataset of 35 GBM and matched normalcontrols. Eighteen genes were found somatically mutated by Sangersequencing in the independent panel and are labeled in red in FIG. 1 .Each of the validated new GBM genes is targeted by somatic mutations andCNVs in a cumulative fraction comprised between 2.9% and 45.7% of GBM(FIG. 9 ).

Among the commonly mutated and focally deleted genes that exhibited topMutComFocal scores and were validated in the independent GBM dataset,BCOR, LRP family members, HERC2, LZTR-1 and CTNND2. BCOR, a chromosomeX-linked gene, encodes for a component of the nuclear co-repressorcomplex that is essential for normal development of the neuroectodermand stem cell functions¹¹⁻¹³. BCOR mutations have recently beendescribed in retinoblastoma and medulloblastoma, thus indicating thatloss-of-function mutations in BCOR are common genetic events inneuroectodermal tumors^(14,15). LRP1B is a member of the LDL receptorfamily and is among the most frequently mutated genes in human cancer(FIG. 1 c )¹⁶. Interestingly, two other LDL receptor family members(LRP2 and LRP1) are mutated in 4.4% and 2.9% of tumors, respectively(FIG. 1 a ). The LRP proteins are highly expressed in theneuroepithelium and are essential for morphogenesis of the forebrain inmouse and humans^(17,18). The tumor suppressor function of LRP proteinsin GBM may be linked to their ability to promote chemosensitivity andcontrol signaling through the Sonic hedgehog pathway, which isresponsible for maintenance of cancer initiating cells in GBM¹⁹⁻²¹. Thegene coding for the Hect ubiquitin ligase Herc2 is localized onchromosome 15q13 and is deleted and mutated in 15.1% and 2.2% of GBMcases, respectively. This gene has been implicated in severeneurodevelopmental syndromes. Moreover, protein substrates of Herc2 arecrucial factors in genome stability and DNA damage-repair, two cellfunctions frequently disrupted in cancer^(22,23).

Loss-of-Function Genetic Alterations Target the LZTR-1 and CTNND2 Genesin GBM

A gene that received one of the highest Del-Mut score by MutComFocal isLZTR-1 (FIG. 1 c ). The LZTR-1 coding region had non-synonymousmutations in 4.4% and the LZTR-1 locus (human chromosome 22q11) wasdeleted in 22.4% of GBM. LZTR-1, which is normally expressed in thehuman brain, codes for a protein with a characteristicKelch-BTB-BACK-BTB-BACK domain architecture (FIGS. 8, 9 ). The LZTR-1gene is highly conserved in the metazoans and was initially proposed tofunction as a transcriptional regulator, but follow-up studies haveexcluded a transcriptional role for this protein²⁴. Most proteins withBTB-BACK domains are substrate adaptors in Cullin3 (Cul3) ubiquitinligase complexes, in which the BTB-BACK region binds to the N-terminaldomain of Cul3, while a ligand binding domain, often a Kelch 6-bladedβ-propeller motif, binds to substrates targeted for ubiquitination²⁵. Toask whether LZTR-1 directly binds Cul3, co-immunoprecipitationexperiments were performed. FIG. 2 a shows that Cul3 immunoprecipitatescontain LZTR-1, thus indicating that LZTR-1 is an adaptor in Cul3ubiquitin ligase complexes.

To address the potential function of LZTR-1 mutants, a homology model ofLZTR-1 was built based in part on the crystal structures of theMATH-BTB-BACK protein SPOP²⁶, the BTB-BACK-Kelch proteins KLHL3 andKLHL11²⁷, and the Kelch domain of Keap1²⁸ (FIG. 2 b ). Without beingbound by theory, the second BTB-BACK region of LZTR-1 binds Cul3 becauseof the presence of a ϕ-X-E motif in this BTB domain, followed by a3-Box/BACK region (FIG. 9 )²⁶. However, the preceding BTB-BACK regionalso participates in Cul3 binding. Four of the six LZTR-1 mutationsidentified in GBM are located within the Kelch domain and target highlyconserved amino acids (FIG. 2 b, c , FIG. 8 ). Interestingly, theconcentration of LZTR-1 mutations in the Kelch domain reflects a similarpattern of mutations in the Kelch-coding region of the KLHL3 gene,recently identified in families with hypertension and electrolyticabnormalities^(29,30). The R198G and G248R mutations localize to the b-cloop of the Kelch domain, in a region predicted to provide thesubstrate-binding surface of the domain²⁸. The W105R mutation targets ahighly conserved anchor residue in the Kelch repeats and the T2881mutation disrupts a buried residue that is conserved in LZTR-1 (FIG. 2 b, FIG. 8 ). Both of these mutations are expected to perturb the foldingof the Kelch domain. The remaining two mutations, located in theBTB-BACK domains are predicted to affect the interaction with Cul3either by removing the entire BTB-BACK-BTB-BACK region (W437STOP) or bydisrupting the folding of the last helical hairpin in the BTB-BACKdomain (R810W, FIG. 2 b ). The pattern of mutations of LZTR-1 in GBMindicates that they impair binding either to specific substrates or toCul3.

Among the top ranking genes in MutComFocal, CTNND2 is the gene expressedat the highest levels in the normal brain. CTNND2 codes for □-catenin, amember of the p120 subfamily of catenins that is expressed almostexclusively in the nervous system where it is crucial for neuriteelongation, dendritic morphogenesis and synaptic plasticity³¹⁻³³.Germ-line hemizygous loss of CTNND2 severely impairs cognitive functionand underlies some forms of mental retardation^(34,35). CTNND2 shows apronounced clustering of mutations in GBM. The observed spectrum ofmutations includes four mutations located in the armadillo-coding domainand one in the region coding for the N-terminal coiled-coil domain (FIG.10 a ). These regions are the two most relevant functional domains ofδ-catenin and each of the mutations targets highly conserved residueswith probably (K629Q, A776T, S881L, D999E) and possibly (A71T) damagingconsequences³⁶. Together with focal genomic losses of CTNND2 (FIG. 10 b), the mutation pattern indicates that CTNND2 is a tumor suppressor genein GBM.

It was asked whether the expression of CTNND2 is down-regulated duringoncogenic transformation in the CNS. Immunostaining experiments showedthat δ-catenin is strongly expressed in the normal human and mouse brainwith the highest expression in neurons (FIG. 3 a , FIG. 10 c ).Conversely, the immunostaining analysis of 69 GBM revealed negligible orabsent expression of δ-catenin in 21 cases (FIG. 3 b ). Oncogenictransformation in the CNS frequently results in loss of the defaultproneural cell fate in favor of an aberrant mesenchymal phenotype, whichis associated with a very aggressive clinical outcome³⁷. The analysis ofgene expression profiles of 498 GBM from the ATLAS-TCGA collectionshowed that low expression of CTNND2 is strongly enriched in tumorsidentified by a mesenchymal gene expression signature (T-testp-value=2.4 10⁻¹², FIG. 10 d ). Tumors with low CTNND2 expression werealso characterized by poor clinical outcome and, among them tumors withcopy number losses of the CTNND2 gene displayed the worst prognosis(FIG. 3 c, d ). Mesenchymal transformation of GBM, which is detected inthe vast majority of established glioma cell lines, is associated withan apparently irreversible loss of the proneural cell fate and neuronalmarkers³⁷. Expression of δ-catenin in the U87 human glioma cell linereduced cell proliferation (FIG. 3 e ), decreased the expression ofmesenchymal markers (FIG. 3 f ) and induced neuronal differentiation asshown by elongation of β3-tubulin-positive neurites and development ofbranched dendritic processes that stained positive for the post-synapticmarker PSD95 (FIG. 3 g ). Accordingly, δ-catenin decreased expression ofcyclin A, a S-phase cyclin and up-regulated the Cdk inhibitor p27^(Kip1)and the neuronal-specific gene N-cadherin (FIG. 3 h ). Thus, restoringthe normal expression of δ-catenin reprograms mesenchymal glioma cellstowards the proneural lineage.

Recurrent EGFR Fusions in GBM

To identify gene fusions in GBM, RNA-seq data was analyzed from a totalof 185 GBM samples (161 primary GBM plus 24 short-term cultures ofglioma stem-like cells (GSCs) freshly isolated from patients carryingprimary GBM). The analysis of the RNA-seq dataset led to the discoveryof 92 candidate rearrangements that give rise to in-frame fusiontranscripts (FIG. 27 ). Beside the previously reported FGFR3-TACC3fusions events, the most frequent recurrent in-frame fusions involvedEGFR in 7.6% of samples (14/185, 3.8%-11.3% CI). Nine of the 14 EGFRfusions included the recurrent partners SEPT14 (6/185, 3.2%) and PSPH(3/185, 1.6%) as the 3′ gene segment in the fusion. Two in-frame highlyexpressed fusions were also found involving the neurotrophic tyrosinekinase receptor 1 gene (NTRK1) as 3′ gene with two different 5′ partners(NFASC-NTRK1 and BCAN-NTRK1). Fusions with a similar structure involvingNTRK1 are commonly found in papillary thyroid carcinomas³⁸. UsingEXomeFuse, an algorithm for the reconstruction of genomic fusions fromwhole-exome data, EGFR-SEPT14 and NRTK1 fusions are the result ofrecurrent chromosomal translocations and reconstructed the correspondinggenomic breakpoints (FIG. 27 ).

By sequencing the PCR products spanning the fusion breakpoint, each ofthe three types of recurrent in-frame fusion predictions (EGFR-SEPT14,EGFR-PSPH and NRTK1 fusions, FIG. 4 , FIG. 11 , and FIG. 12 ) werevalidated. In FIG. 4 a, b the prediction and cDNA sequence validationare shown, respectively, for one of the tumors harboring an EGFR-SEPT14fusion (TCGA-27-1837). The amplified cDNA contained an open readingframe for a protein of 1,041 amino acids resulting from the fusion of anEGFR amino-terminal portion of residues 1-982 with a SEPT14carboxy-terminal portion of residues 373-432 (FIG. 4 c ). Thus, thestructure of the EGFR-Septin14 fusion proteins involves EGFR at theN-terminus, providing a receptor tyrosine kinase domain fused to acoiled-coil domain from Septin14. Exon-specific gene expression analysisfrom the RNA-seq coverage in TCGA-27-1837 demonstrated that the EGFR andSEPT14 exons implicated in the fusion are highly overexpressed comparedwith the mRNA sequences not included in the fusion event (FIG. 13 ).Using PCR, the genomic breakpoint coordinates were mapped to chromosome7 (#55,268,937 for EGFR and #55,870,909 for SEPT14, genome buildGRCh37/hg19) falling within EGFR exon 25 and SEPT14 intron 9, whichgives rise to a transcript in which the 5′ EGFR exon 24 is spliced tothe 3′ SEPT14 exon 10 (FIG. 4 d ). Interestingly, the fused EGFR-PSPHcDNA and predicted fusion protein in the GBM sample TCGA-06-5408involves the same EGFR N-terminal region implicated in the EGFR-SEPT14with PSPH providing a carboxy-terminal portion of 35 amino acids (FIG.11 ). An example of a fusion in which the EGFR-TK region is the 3′partner is the CAND1-EGFR fusion in GSC-3316 (FIG. 14 ). Thus, either inthe more frequent fusions in which EGFR is the 5′ partner or in thosewith EGFR as the 3′ gene, the region of the EGFR mRNA coding for the TKdomain is invariably retained in each of the fusion transcripts (FIG. 27). RT-PCR and genomic PCR followed by Sanger sequencing from GBMTCGA-06-5411 were also used to successfully validate successfully theNFASC-NTRK1 fusions in which the predicted fusion protein includes theTK domain of the high-affinity NGF receptor (TrkA) fused downstream tothe immunoglobulin-like region of the cell adhesion and ankyrin-bindingregion of neurofascin (FIG. 12 ).

To confirm that GBM harbor recurrent EGFR fusions and determine thefrequency in an independent dataset, cDNA was screened from a panel of248 GBMs and discovered 10 additional cases harboring EGFR-SEPT14fusions (4%). Conversely, NFASC-NTRK1 fusions were not detected in thisdataset. The frequency of EGFR-PSPH fusions was 2.2% (3/135).

The discovery of recurrent EGFR fusions in GBM is of particularinterest. EGFR is activated in a significant fraction of primary GBM(˜25%) by an in-frame deletion of exons 2-7 (EGFRvIII)³⁹. To establishthe functional relevance of EGFR fusions, it was determined whether themost frequent EGFR fusion in GBM (EGFR-SEPT14) provides an alternativemechanism of EGFR activation and confers sensitivity to EGFR inhibition.The EGFR-SEPT14 cDNA was cloned and prepared lentiviruses expressingEGFR-SEPT14, EGFRvIII or EGFR wild type. Transduction of the SNB19glioma cell line (which lacks genomic alteration of EGFR) with therecombinant lentiviruses showed that cells expressing EGFR-SEPT14 orEGFRvIII proliferated at a rate that was 2-fold higher than controlcells or cells expressing wild type EGFR (FIG. 5 a ). Furthermore,EGFR-SEPT14 and EGFRvIII markedly enhanced the ability of SNB19 cells tomigrate in a wound assay (FIG. 5 b, c ). Finally it was investigatedwhether EGFR-SEPT14 fusions confer sensitivity to EGFR-TK activity invivo. The analysis of a collection of 30 GBM xenografts directlyestablished in the mouse from human GBM identified one xenograft model(D08-0537 MG) harboring the EGFR-SEPT14 fusion. The D08-0537 MG had beenestablished from a heavily pretreated GBM. Treatment of D08-0537 MGtumors with two EGFR inhibitors showed that each of the two drugssignificantly delayed the rate of tumor growth (FIG. 5 d ).Interestingly, lapatinib, an irreversible EGFR inhibitor recentlyproposed to target EGFR alterations in GBM⁴⁰, displayed the strongestanti-tumor effects (FIG. 5 d, e ). Conversely, EGFR inhibitors wereineffective against the GBM xenograft D08-0714 MG, which lacks genomicalterations of the EGFR gene (FIG. 5 d, e ). Taken together, these datadetermine that the EGFR-SEPT14 fusion confers a proliferative andmigratory phenotype to glioma cells and imparts sensitivity to EGFRinhibition to human glioma harboring the fusion gene.

Discussion

A computational pipeline is described for the nomination of somaticcancer genes. This approach computes frequency, magnitude and focalityof CNVs at any loci in the human genome with the somatic mutation ratefor the genes residing at that genomic location. Thus, two of thegenetic hallmarks of driver cancer genes (focality of copy numberaberrations and point mutations) are integrated into a single score. Theapproach identifies marks of positive somatic selection in largeunbiased cancer genome studies by efficiently removing the large burdenof passenger mutations that characterize most human tumors and will beapplicable to the dissection of the genomic landscape of other cancertypes.

Besides recognizing nearly all the known genes reported to havefunctional relevance in GBM, our study discovered and validated somaticmutations in 18 new genes, which also harbor focal and recurrent CNVs ina significant fraction of GBM. For some of these genes, their importanceextends beyond GBM, as underscored by cross-tumor relevance (e.g. BCOR),and protein family recurrence (e.g. LRP family members). For example,mutations of LZTR-1 have been reported in other tumors. In particular,mutations of the highly conserved residues in the Keltch domain (W105,G248, T288) and in the second BTB-BACK domain (R810) reported here arerecurrent events in other tumor types⁴¹. Thus, understanding the natureof the substrates of LZTR-1-Cul3 ubiquitin ligase activity will provideimportant insights into the pathogenesis of multiple cancer types.

The identification of genetic and epigenetic loss-of-functionalterations of the CTNND2 gene clustered in mesenchymal GBM provides aclue to the genetic events driving this aggressive GBM subtype. Theimportant functions of δ-catenin for such crucial neuronal morphogenesisactivities as the coordinated control of axonal and dendriticarborization indicates that full-blown mesenchymal transformation in thebrain requires loss of the master regulators constraining celldetermination in the CNS along the default neuronal lineage. The abilityof δ-catenin to reprogram glioma cells that express mesenchymal genestowards a neuronal fate unravels an unexpected plasticity of mesenchymalGBM that might be exploited therapeutically.

In this study, the landscape of gene fusions is reported from a largedataset of GBM analyzed by RNA-Sequencing. In-frame gene fusionsretaining the RTK-coding domain of EGFR emerged as the most frequentgene fusion events in GBM. In this tumor, EGFR is frequently targeted byfocal amplications and our finding underscores the strong recombinogenicprobability of focally amplified genes, as recently reported for the myclocus in medulloblastoma⁴². Resembling intragenic rearrangements thatgenerate the EGFRvIII allele, EGFR-SEPT14 fusions enhance theproliferative and migratory capacity of glioma cells. They also confersensitivity to EGFR inhibition to human GBM grown as mouse xenografts.These findings highlight the relevance of gene fusions implicatingRTK-coding genes in the pathogenesis of GBM⁹. They also provide a strongrationale for the inclusion of GBM patients harboring EGFR fusions inclinical trials based on EGFR inhibitors.

Methods

139 paired tumor-normal samples from TCGA were analyzed with the SAVIpipeline⁴³. The SAVI algorithm estimates frequencies of variant allelesin sample as well as the difference in allele frequency between pairedsamples. The algorithm establishes posterior high credibility intervalsfor those frequencies and differences of frequencies, which can be usedfor genotyping the samples on the one hand, and detecting somaticmutations in the case of tumor/normal pairs of samples on the other. Thealgorithm allows for random sequencing errors and uses the Phred scoresof the sequenced alleles as an estimate of their reliability. Tointegrate point mutation and 469 GBM CNV data (Affymetrix SNP6.0),MutComFocal (see below) was used. The MutComFocal algorithm assigns adriver score to each gene through three different strategies that givepriority to lesions, samples, and genes in which there is lessuncertainty regarding potential tumorigenic drivers. First, the focalitycomponent of the score is inversely proportional to the size of thegenomic lesion to which a gene belongs and thus prioritizes more focalgenomic lesions. Second, the recurrence component of the MutComFocalscore is inversely proportional to the total number of genes altered ina sample, which prioritizes samples with a smaller number of alteredgenes. Finally, the mutation component of the score is inverselyproportional to the total number of genes mutated in a sample, whichachieves the two-fold goal of prioritizing mutated genes on one hand,and samples with a smaller number of mutations on the other.

161 RNA-Seq GBM tumor samples were also analyzed from TCGA plus 24generated from our own dataset of GSCs. Nine of these samples previouslyreported in other studies were kept in the list to evaluate recurrence⁹.The samples were analyzed by means of the ChimeraScan algorithm in orderto detect a list of gene fusion candidates⁴⁴. Using the Pegasusannotation pipeline (http://sourceforge.net/projects/pegasus-fus/), thefusion transcript was reconstructed, the reading frame was annotated andprotein domains were detected that are either conserved or lost in thenew chimeric event. The genomic breakpoint of recurrent gene fusion RNAtranscripts was also probed for using whole-exome sequencing data(EXome-Fuse algorithm)⁹. The Kaplan-Meier survival analysis for CTNND2CNV and CTNND2 expression were obtained using the REMBRANDT gliomadataset.

SAVI (Statistical Algorithm for Variant Frequency Identification):

The frequency of alleles in a sample was estimated by the SAVI pipeline,which constructs an empirical Bayesian prior for those frequencies,using data from the whole sample, and obtains a posterior distributionand high credibility intervals for each alleles^(S1). The prior andposterior are distributed over a discrete set of frequencies with aprecision of 1% and are connected by a modified binomial likelihood,which allows for some error rate. More precisely, a prior distributionp(f) of the frequency f and a prior for the error e uniform on theinterval [0, E] was assumed for a fixed 0≤E≤1. The sequencing data at aparticular allele is a random experiment producing a string of m (thetotal depth at the allele) bits with n “1”s (the variant depth at theallele). Assuming a binomial likelihood of the data and allowing forbits being misread due to random errors, the posterior probability P(f)of the frequency f is

${P(f)} = {{\frac{p(f)}{C} \cdot \frac{1}{b - a}}{\overset{f + E - {2{Ef}}}{\int\limits_{f}}{{x^{n}\left( {1 - x} \right)}^{m - n}{dx}}}}$

-   -   where C is a normalization constant. For a particular allele,        the value of E is determined by the quality of the nucleotides        sequenced at that position as specified by their Phred scores.        The SAVI pipeline takes as input the reads produced by the        sequencing technology, filters out low quality reads and maps        the rest onto a human reference genome. After mapping, a        Bayesian prior for the distribution of allele frequencies for        each sample is constructed by an iterative posterior update        procedure, starting with a uniform prior. To genotype the        sample, the posterior high credibility intervals were used for        the frequency of the alleles at each genomic location.        Alternatively, combining the Bayesian priors from different        samples, posterior high credibility intervals were obtained for        the difference between the samples of the frequencies of each        allele. Finally, the statistically significant differences        between the tumor and normal samples are reported as somatic        variants. To estimate the positive prediction value of SAVI in        the TCGA GBM samples, 41 mutations were selected for independent        validation by Sanger sequencing. 39 of the 41 mutations were        confirmed using Sanger sequencing, resulting in 0.95 (95% CI        0.83-0.99) validation rate.

Candidate genes were ranked by the number of somatic non-synonymousmutations. A robust fit of the ratio of non-synonymous to synonymousratio was generated with a bisquare weighting function. Excess ofnon-synonymous alterations was estimated using a Poisson distributionwith mean equal to the product of the ratio from the robust fit and thenumber of synonymous mutations. Genes in highly polymorphic genomicregions were filtered out based on an independent cohort of normalsamples. The list of these regions includes families of genes known togenerate false positives in somatic predictions (e.g. HLA, KRT and OR).

MutComFocal. Key cancer genes are often found amplified or deleted inchromosomal regions containing many other genes. Point mutations andgene fusions, on the other hand, provide more specific information aboutwhich genes may be implicated in the oncogenic process. MutComFocal, aBayesian approach aiming to identify driver genes by integrating CNV andpoint mutation data was developed.

For a particular sample, let (c₁,N₁), . . . , (c_(k),N_(k)) describe theamplification lesions in that sample so that N_(i) is the number ofgenes in the i-th lesion and c_(i) is its copy number change fromnormal. For a gene belonging to the i-th lesion the amplificationrecurrence sample score is defined as c_(i)/(Σ_(j)c_(j)·N_(j)) and itsamplification focality sample score is defined as(c_(i)/(Σ_(j)c_(j))·(1/N_(i)). To obtain the amplification recurrenceand focality scores for a particular gene, the corresponding samplescores were summed over all samples and normalize the result so thateach score sums to 1. The deletion and recurrence scores are defined ina similar manner. The mutation score is analogous to a recurrence scorein which it was assumed that mutated genes belong to lesions with onlyone gene.

The amplification/mutation score is defined as the product of the twoamplification scores and the mutation score while deletion/mutationscore is defined as the product of the two deletion scores and themutation score. The amplification/mutation and deletion/mutation scoresare normalized to 1 and for each score, genes are divided into tiersiteratively, so that the top 2^(H) remaining genes are included in thenext tier, where H is the entropy of the scores of the remaining genesnormalized to 1. Based on their tier across the different types ofscores, genes are assigned to being either deleted/mutated oramplified/mutated and genes in the top tiers are grouped into contiguousregions. The top genes in each region are considered manually andselected for further functional validation.

The recurrence and focality scores can be interpreted as the posteriorprobabilities that a gene is driving the selection of the disease, undertwo different priors for this: one global and one local in nature. Therecurrence score is higher if a gene participates in many samples thatdo not have too many altered genes, while the focality score is higherif the gene participates in many focal lesions. Besides lending strongsupport to the inference of a gene as a potential driver, thedirectionality of the copy number alteration (amplification or deletion)informs us of the likely behavior of the candidate gene as an oncogeneor tumor suppressor, respectively.

The genes displayed in FIG. 1 are selected based on the MutComFocalranking (top 250 genes), the size of minimal region (less than 10 genes)and frequency of mutations (more than 2% for deletion/mutations and atleast 1% in amplification/mutations).

RNA-Seq bioinformatics analysis. 161 RNA-Seq GBM tumor samples wereanalyzed from The Cancer Genome Atlas (TCGA), a public repositorycontaining large-scale genome-sequencing of different cancers, plus 24patients-derived GSCs. Nine of the GSCs samples reported in previousstudies were kept in the list to evaluate recurrence^(S2). The sampleswere analyzed by means of the ChimeraScan^(S3) algorithm in order todetect a list of gene fusion candidates. Briefly, ChimeraScan detectsthose reads that discordantly align to different transcripts of the samereference (split inserts). These reads provide an initial set ofputative fusion candidates. Finally, the algorithm realigns theinitially unmapped reads to the putative fusion candidates and detectsthose reads that align across the junction boundary (split reads). Thesereads provide the genomic coordinates of the breakpoint.

RNA-Seq analysis detected a total of 39,329 putative gene fusion events.In order to focus the experimental analysis on biologically relevantfused transcripts, Pegasus annotation pipeline(http://sourceforge.net/projects/pegasus-fus/) were applied. For eachputative fusion, Pegasus reconstructs the entire fusion sequence on thebase of genomic fusion breakpoint coordinates and gene annotations.Pegasus also annotates the reading frame of the resulting fusionsequences as either in-frame or frame-shift. Moreover, Pegasus detectsthe protein domains that are either conserved or lost in the newchimeric event by predicting the amino acid sequence and automaticallyquerying the UniProt web service. On the basis of the Pegasus annotationreport, relevant gene fusions were selected for further experimentalvalidation according to the reading frame and the conserved/lostdomains. The selected list (FIG. 27 ) was based on in-frame eventsexpressed by ten or more reads and at least one read spanning thebreaking point. To filter out candidate transplicing events, events withputative breakpoints at a distance of at least 25 kb were focused.

EXome-Fuse: Identification of Genetic Rearrangements using Whole-ExomeData. Although whole-exome sequencing data contains low introniccoverage that reduces the sensitivity for fusion discovery, it isreadily available through the TCGA database. To characterize the genomicbreakpoint of the chromosomal rearrangement, EXome-Fuse was designed: agene fusion discovery pipeline particularly designed to analyzewhole-exome data. For the samples harboring EGFR-SEPT14, EGFR-PSPH,NFASC-NTRK1, and BCAN-NTRK1 fusions in RNA, EXome-Fuse was applied tothe corresponding whole-exome sequencing data deposited in TCGA. Thisalgorithm can be divided into three stages: split insert identification,split read identification, and virtual reference alignment. Mappingagainst the human genome reference hg18 with BWA, all split inserts werefirst identified to compile a preliminary list of fusion candidates.This list was cut of any false positives produced from paralogous genepairs using the Duplicated Genes Database and the EnsemblComparaGeneTrees⁴. Pseudogenes in the candidate list were annotated using thelist from HUGO Gene Nomenclature Committee (HGNC) databases^(S5) andgiven lower priority. Candidates were also filtered out betweenhomologous genes, as well as those with homologous or low-complexityregions around the breakpoint. For the remaining fusion candidates, anysupporting split reads and their mates were probed using BLAST with aword size of 16, identity cutoff of 90%, and an expectation cutoff of10⁻⁴. Finally, a virtual reference was created for each fusiontranscript and all reads were re-align to calculate a final tally ofsplit inserts and split reads such that all aligning read pairs maintainF-R directionality.

Targeted Exon Sequencing

All protein-coding exons for the 24 genes of interest were sequencedusing genomic DNA extracted from frozen tumors and matched blood. 500 ngof DNA from each sample were sheared to an average of 150 bp in aCovaris instrument for 360 seconds (Duty cycle—10%; intensity—5;cycles/Burst—200). Barcoded libraries were prepared using the KapaHigh-Throughput Library Preparation Kit Standard (Kapa Biosystems).Libraries were amplified using the KAPA HiFi Library Amplification kit(Kapa Biosystems) (8 cycles). Libraries were quantified using QubitFluorimetric Quantitation (Invitrogen) and the quality and size assessedusing an Agilent Bioanalyzer. An equimolar pool of the 4 barcodedlibraries (300 ng each) was created and 1,200 ng was input to exoncapture using one reaction tube of the custom Nimblegen SeqCap EZ(Roche) with custom probes target the coding exons of the 38 genes.Capture by hybridization was performed according to the manufacturer'sprotocols with the following modifications: 1 nmol of a pool of blockeroligonucleotides (complementary to the barcoded adapters), and (B)post-capture PCR amplification was done using the KAPA HiFi LibraryAmplification kit instead of the Phusion High-Fidelity PCR Master Mixwith HF Buffer Kit, in a 60 μl volume, since the Kapa HiFi kit greatlyreduced or eliminated the bias against GC-rich regions. The pooledcapture library was quantified by Qubit (Invitrogen) and Bioanalyzer(Agilent) and sequenced in on an Illumina MiSeq sequencer using the2×150 paired-end cycle protocol. Reads were aligned to the hg19 build ofthe human genome using BWA with duplicate removal using samtools asimplemented by Illumina MiSeq Reporter. Variant detection was performedusing GATK UnifiedGenotyper. Somatic mutations were identified forpaired samples using SomaticSniper and filtered for frequency of lessthan 3% in normal and over 3% in tumor samples. Variants were annotatedwith Charity annotator to identify protein-coding changes andcross-referenced against known dbSNP, 1000 Genomes, and COSMICmutations. Sanger sequencing was used to confirm each mutation fromnormal and tumor DNA.

Modeling of LZTR-1

Structural templates for the Kelch and BTB-BACK regions of human LZTR-1were identified with HHpred^(S6). An initial 3D model was generated withthe I-TASSER servers^(S7). The Cul3 N-terminal domain was docked ontothe model by superposing the KLHL3^(BTB-BACK)/Cul3^(NTD) crystalstructure (PDB ID 4HXI, Xi and Privé PLOS ONE 2013) onto the secondLZTR-1 BTB-BACK domain. The model does not include higher quaternarystructure, although many BTB domains, and many Kelch domains, are knownto self-associate^(S8). The short linkage between the end of the firstBACK domain and the beginning of the second BTB domain would appear topreclude an intrachain BTB-BTB pseudo-homodimer; without being bound bytheory, LZTR-1 self-associates and forms higher order assemblies. BothBACK domains are the shorter, atypical form of the domain and consist of2 helical hairpin motifs, as in SPOP^(S9,S10), and not the 4-hairpinmotif seen most BTB-BACK-Kelch proteins^(S10,S11). The model from theKelch domain predicts an unusual 1+3 velcro arrangement^(S12), with theN-terminal region contributing strand d of blade 1 and the C-terminalregion contributing strands a,b,c of the same blade, although analternative 2+2 velcro model cannot be ruled out.

Cell Culture

SNB19 and U87 cells were cultured in DMEM supplemented with 10% FetalBovine Serum. Growth rate was determined by plating cells in six-wellplates post 3 days after infection with the lentivirus indicated inFigure Legends. The number of viable cells was determined by trypan blueexclusion in triplicate cultures obtained from triplicate independentinfections. Migration was evaluated by Confluent cells were scratchedwith a pipette tip and cultured in 0.25% FBS. After 16 h, images weretaken using the Olympus IX70 connected to a digital camera. Images wereprocessed using the ImageJ64 software. The area of the cell-free woundwas assessed in triplicate samples. Experiments were repeated twice.

Immunofluorescence and Western Blot

immunoflurescence staining on brain tumor tissue microarrays wereperformed as previously described^(S17). Immunofluorescence microscopywas performed on cells fixed with 4% para-formaldehyde (PFA) inphosphate buffer. Cells were permeabilized using 0.2% Triton X 100.Antibodies and concentrations used in immunofluorescence staining are:

B-III Tubulin Mouse 1:400 Promega Catenin D2 Guinea Pig 1:500 AcrisFibronectin Mouse 1:1,000 BD-Pharmingen PSD-95 Rabbit 1:500 Invitrogen

Secondary antibodies conjugated to Alexa Fluor 594 (Molecular Probes)were used. DNA was stained by DAPI (Sigma). Fluorescence microscopy wasperformed on a Nikon A1R MP microscope.

Western blot analysis of U87 cells transduced with pLOC-GFP or pLOCCTNND2 was performed using the following antibodies:

Anti-Vinculin Mouse 1:400 SIGMA Anti-N-Cadherin Mouse 1:200BD-Pharmingen Cyclin A Rabbit 1:500 Santa Cruz P27 Mouse 1:250 BDTransduction

Cloning and Lentiviral Production

The lentiviral expression vector, pLOC-GFP and pLOC-LZTR1 were purchasedfrom Open Biosystems. The full length EGFR-SEPT14 cDNA was amplifiedfrom tumor sample TCGA-27-1837. Primers used were: EGFR FW:5′-agcgATGCGACCCTCCGGGA-3′ (SEQ ID NO: 30) and SEPT14 REV:5′-TCTTACGATGTTTGTCTTTCTTTGT (SEQ ID NO: 31); EGFR wild type, EGFR Viiiand EGFR-SEPT14 cDNAs were cloned into pLoc and lentiviral particleswere produced using published protocols^(S13-S15).

Genomic and mRNA RT-PCR

Total RNA was extracted from cells by using RNeasy Mini Kit (QIAGEN),following the manufacturer instructions. 500 ng of total RNA wasretro-transcribed by using the Superscript III kit (Invitrogen),following the manufacturer instructions. The cDNAs obtained after theretro-transcription was used as templates for qPCR asdescribed^(S13,S15). The reaction was performed with a Roche480 thermalcycler, using the Absolute Blue QPCR SYBR Green Mix from ThermoScientific. The relative amount of specific mRNA was normalized toGAPDH. Results are presented as the mean±SD of triplicateamplifications. The validation of fusion transcripts was performed usingboth genomic and RT-PCR with forward and reverse primer combinationsdesigned within the margins of the paired-end read sequences detected byRNA-seq. Expressed fusion transcript variants were subjected to directsequencing to confirm sequence and translation frame. Primers used forthe screening of gene fusions are:

hEGFR-RT-FW1: (SEQ ID NO: 32) 5′- GGGTGACTGTTTGGGAGTTGATG -3′;hSEP14-RT-REV1: (SEQ ID NO: 33) 5′- TGTTTGTCTTTCTTTGTATCGGTGC-3′;hEGFR-RT-FW1: (SEQ ID NO: 34) 5′- GTGATGTCTGGAGCTACGGG-3′;hPSPH-RT-REV1: (SEQ ID NO: 35) 5′- TGCCTGATCACATTTCCTCCA-3′;hNFASC-RT-FW1: (SEQ ID NO: 36) 5′- AGTTCCGTGTCATTGCCATCAAC-3′;hNTRK1-RT-REV1: (SEQ ID NO: 37) 5′- TGTTTCGTCCTTCTTCTCCACCG-3′;hCAND1-RT-FW1: (SEQ ID NO: 38) 5′- GGAAAAAATGACATCCAGCGAC-3′;hEGFR-RT-REV1: (SEQ ID NO: 39) 5′- TGGGTGTAAGAGGCTCCACAAG-3′.

Primers used for genomic detection of gene fusions are:

genomic EGFR-FW1: (SEQ ID NO: 40) 5′- GGATGATAGACGCAGATAGTCGCC-3′;genomic SEPT14-REV1: (SEQ ID NO: 41) 5′- TCCAGTTGTTTTTTCTCTTCCTCG-3′;genomic NFASC-FW1: (SEQ ID NO: 42) 5′- AAGGGAGAGGGGACCAGAAAGAAC -3′;genomic NTRK1-REV1: (SEQ ID NO: 43) 5′- GAAAGGAAGAGGCAGGCAAAGAC -3′;genomic CAND1-FW1: (SEQ ID NO: 44) 5′- GCAATAGCAAAACAGGAAGATGTC-3′;genomic EGFR-REV1: (SEQ ID NO: 45) 5′- GAACACTTACCCATTCGTTGG-3′.

Subcutaneous Xenografts and Drug Treatment

Female athymic mice (nu/nu genotype, Balb/c background, 6 to 8 weeksold) were used for all antitumor studies. Patient-derived adult humanglioblastoma xenografts were maintained. Xenografts were excised fromhost mice under sterile conditions, homogenized with the use of a tissuepress/modified tissue cytosieve (Biowhitter Inc, Walkersville, MD) andtumor homogenate was loaded into a repeating Hamilton syringe (Hamilton,Co., Reno, NV) dispenser. Cells were injected sub-cutaneously into theright flank of the athymic mouse at an inoculation volume of 50 μl witha 19-gauge needle^(S16). Subcutaneous tumors were measured twice weeklywith hand-held vernier calipers (Scientific Products, McGraw, IL). Tumorvolumes, V were calculated with the following formula:[(width)²×(length)]/2=V (mm³). For the sub-cutaneously tumor studies,groups of mice randomly selected by tumor volume were treated with EGFRkinase inhibitors when the median tumor volumes were on average 150 mm³and were compared with control animals receiving vehicle (saline).Erlotinib was administered at 100 mg/Kg orally daily for 10 days.Lapatinib was administered at 75 mg/Kg orally twice per day for 20 days.Response to treatment was assessed by delay in tumor growth and tumorregression. Growth delay, expressed as T-C, is defined as the differencein days between the median time required for tumors in treated andcontrol animals to reach a volume five times greater than that measuredat the start of the treatment. Tumor regression is defined as a decreasein tumor volume over two successive measurements. Statistical analysiswas performed using a SAS statistical analysis program, the Wilcoxonrank order test for growth delay, and Fisher's exact test for tumorregression as previously described.

REFERENCES

-   1 Porter, K. R., McCarthy, B. J., Freels, S., Kim, Y. & Davis, F. G.    Prevalence estimates for primary brain tumors in the United States    by age, gender, behavior, and histology. Neuro-oncology 12, 520-527,    doi:10.1093/neuonc/nop066 (2010).-   2 Stupp, R. et al. Radiotherapy plus concomitant and adjuvant    temozolomide for glioblastoma. The New England journal of medicine    352, 987-996, doi:10.1056/NEJMoa043330 (2005).-   3 Cancer Genome Atlas Research, N. Comprehensive genomic    characterization defines human glioblastoma genes and core pathways.    Nature 455, 1061-1068, doi:10.1038/nature07385 (2008).-   4 Noushmehr, H. et al. Identification of a CpG island methylator    phenotype that defines a distinct subgroup of glioma. Cancer Cell    17, 510-522, doi:10.1016/j.ccr.2010.03.017 (2010).-   5 Parsons, D. W. et al. An integrated genomic analysis of human    glioblastoma multiforme. Science 321, 1807-1812,    doi:10.1126/science.1164382 (2008).-   6 Verhaak, R. G. et al. Integrated genomic analysis identifies    clinically relevant subtypes of glioblastoma characterized by    abnormalities in PDGFRA, IDH1, EGFR, and NF1 Cancer Cell 17, 98-110,    doi:10.1016/j.ccr.2009.12.020 (2010).-   7 Bass, A. J. et al. Genomic sequencing of colorectal    adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat    Genet 43, 964-968, doi:10.1038/ng.936 (2011).-   8 Chinnaiyan, A. M. & Palanisamy, N. Chromosomal aberrations in    solid tumors. Prog Mol Biol Transl Sci 95, 55-94,    doi:10.1016/B978-0-12-385071-3.00004-6 (2010).-   9 Singh, D. et al. Transforming fusions of FGFR and TACC genes in    human glioblastoma. Science 337, 1231-1235,    doi:10.1126/science.1220834 (2012).-   10 Rubin, A. F. & Green, P. Mutation patterns in cancer genomes.    Proc Natl Acad Sci USA 106, 21766-21770, doi:10.1073/pnas.0912499106    (2009).-   11 Fan, Z. et al. BCOR regulates mesenchymal stem cell function by    epigenetic mechanisms. Nat Cell Biol 11, 1002-1009,    doi:10.1038/ncb1913 (2009).-   12 Wamstad, J. A. & Bardwell, V. J. Characterization of Bcor    expression in mouse development. Gene Expr Patterns 7, 550-557,    doi:10.1016/j.modgep.2007.01.006 (2007).-   13 Wamstad, J. A., Corcoran, C. M., Keating, A. M. & Bardwell, V. J.    Role of the transcriptional corepressor Bcor in embryonic stem cell    differentiation and early embryonic development. PLoS One 3, e2814,    doi:10.1371/journal.pone.0002814 (2008).-   14 Pugh, T. J. et al. Medulloblastoma exome sequencing uncovers    subtype-specific somatic mutations. Nature 488, 106-110,    doi:10.1038/nature11329 (2012).-   15 Zhang, J. et al. A novel retinoblastoma therapy from genomic and    epigenetic analyses. Nature 481, 329-334, doi:10.1038/nature10733    (2012).-   16 Beroukhim, R. et al. The landscape of somatic copy-number    alteration across human cancers. Nature 463, 899-905,    doi:10.1038/nature08822 (2010).-   17 Kantarci, S. et al. Mutations in LRP2, which encodes the    multiligand receptor megalin, cause Donnai-Barrow and    facio-oculo-acoustico-renal syndromes. Nat Genet 39, 957-959,    doi:10.1038/ng2063 (2007).-   18 Willnow, T. E. et al. Defective forebrain development in mice    lacking gp330/megalin. Proc Natl Acad Sci USA 93, 8460-8464 (1996).-   19 Christ, A. et al. LRP2 is an auxiliary SHH receptor required to    condition the forebrain ventral midline for inductive signals. Dev    Cell 22, 268-278, doi:10.1016/j.devcel.2011.11.023 (2012).-   20 Cowin, P. A. et al. LRP1B deletion in high-grade serous ovarian    cancers is associated with acquired chemotherapy resistance to    liposomal doxorubicin. Cancer Res 72, 4060-4073,    doi:10.1158/0008-5472.CAN-12-0203 (2012).-   21 Lima, F. R. et al. Glioblastoma: therapeutic challenges, what    lies ahead. Biochim Biophys Acta 1826, 338-349, doi:    10.1016/j.bbcan.2012.05.004 (2012).-   22 Bekker-Jensen, S. et al. HERC2 coordinates ubiquitin-dependent    assembly of DNA repair factors on damaged chromosomes. Nat Cell Biol    12, 80-86; sup pp 81-12, doi:10.1038/ncb2008 (2010).-   23 Harlalka, G. V. et al. Mutation of HERC2 causes developmental    delay with Angelman-like features. J Med Genet 50, 65-73,    doi:10.1136/jmedgenet-2012-101367 (2013).-   24 Nacak, T. G., Leptien, K., Fellner, D., Augustin, H. G. &    Kroll, J. The BTB-kelch protein LZTR-1 is a novel Golgi protein that    is degraded upon induction of apoptosis. J Biol Chem 281, 5065-5071,    doi:10.1074/jbc.M509073200 (2006).-   25 Stogios, P. J., Downs, G. S., Jauhal, J. J., Nandra, S. K. &    Prive, G. G. Sequence and structural analysis of BTB domain    proteins. Genome Biol 6, R82, doi:10.1186/gb-2005-6-10-r82 (2005).-   26 Errington, W. J. et al. Adaptor protein self-assembly drives the    control of a cullin-RING ubiquitin ligase. Structure 20, 1141-1153,    doi:10.1016/j.str.2012.04.009 (2012).-   27 Canning, P. et al. Structural basis for Cul3 assembly with the    BTB-Kelch family of E3 ubiquitin ligases. J Biol Chem,    doi:10.1074/jbc.M112.437996 (2013).-   28 Lo, S. C., Li, X., Henzl, M. T., Beamer, L. J. & Hannink, M.    Structure of the Keap1:Nrf2 interface provides mechanistic insight    into Nrf2 signaling. EMBO J 25, 3605-3617,    doi:10.1038/sj.emboj.7601243 (2006).-   29 Boyden, L. M. et al. Mutations in kelch-like 3 and cullin 3 cause    hypertension and electrolyte abnormalities. Nature 482, 98-102,    doi:10.1038/nature10814 (2012).-   30 Louis-Dit-Picard, H. et al. KLHL3 mutations cause familial    hyperkalemic hypertension by impairing ion transport in the distal    nephron. Nat Genet 44, 456-460, S451-453, doi:10.1038/ng.2218    (2012).-   31 Abu-Elneel, K. et al. A delta-catenin signaling pathway leading    to dendritic protrusions. J Biol Chem 283, 32781-32791,    doi:10.1074/jbc.M804688200 (2008).-   32 Arikkath, J. et al. Delta-catenin regulates spine and synapse    morphogenesis and function in hippocampal neurons during    development. J Neurosci 29, 5435-5442,    doi:10.1523/JNEUROSCI.0835-09.2009 (2009).-   33 Kosik, K. S., Donahue, C. P., Israely, I., Liu, X. & Ochiishi, T.    Delta-catenin at the synaptic-adherens junction. Trends Cell Biol    15, 172-178, doi:10.1016/j.tcb.2005.01.004 (2005).-   34 Israely, I. et al. Deletion of the neuron-specific protein    delta-catenin leads to severe cognitive and synaptic dysfunction.    Curr Biol 14, 1657-1663, doi:10.1016/j.cub.2004.08.065 (2004).-   35 Jun, G. et al. delta-Catenin is genetically and biologically    associated with cortical cataract and future Alzheimer-related    structural and functional brain changes. PLoS One 7, e43728,    doi:10.1371/journal.pone.0043728 (2012).-   36 Hicks, S., Wheeler, D. A., Plon, S. E. & Kimmel, M. Prediction of    missense mutation functionality depends on both the algorithm and    sequence alignment employed. Hum Mutat 32, 661-668,    doi:10.1002/humu.21490 (2011).-   37 Phillips, H. S. et al. Molecular subclasses of high-grade glioma    predict prognosis, delineate a pattern of disease progression, and    resemble stages in neurogenesis. Cancer Cell 9, 157-173,    doi:10.1016/j.ccr.2006.02.019 (2006).-   38 Pierotti, M. A. & Greco, A. Oncogenic rearrangements of the    NTRK1/NGF receptor. Cancer Lett 232, 90-98,    doi:10.1016/j.canlet.2005.07.043 (2006).-   39 Dunn, G. P. et al. Emerging insights into the molecular and    cellular basis of glioblastoma. Genes Dev 26, 756-784,    doi:10.1101/gad.187922.112 (2012).-   40 Vivanco, I. et al. Differential sensitivity of glioma- versus    lung cancer-specific EGFR mutations to EGFR kinase inhibitors.    Cancer Discov 2, 458-471, doi:10.1158/2159-8290.CD-11-0284 (2012).-   41 Forbes, S. A. et al. COSMIC (the Catalogue of Somatic Mutations    in Cancer): a resource to investigate acquired mutations in human    cancer. Nucleic Acids Res 38, D652-657, doi:10.1093/nar/gkp995    (2010).-   42 Northcott, P. A. et al. Subgroup-specific structural variation    across 1,000 medulloblastoma genomes. Nature 488, 49-56,    doi:10.1038/nature11327 (2012).-   43 Tiacci, E. et al. BRAF mutations in hairy-cell leukemia. The New    England journal of medicine 364, 2305-2315,    doi:10.1056/NEJMoa1014209 (2011).-   44 Iyer, M. K., Chinnaiyan, A. M. & Maher, C. A. ChimeraScan: a tool    for identifying chimeric transcription in sequencing data.    Bioinformatics 27, 2903-2904, doi:10.1093/bioinformatics/btr467    (2011).-   S1 Tiacci, E. et al. BRAF mutations in hairy-cell leukemia. The New    England journal of medicine 364, 2305-2315,    doi:10.1056/NEJMoa1014209 (2011).-   S2 Singh, D. et al. Transforming fusions of FGFR and TACC genes in    human glioblastoma. Science 337, 1231-1235,    doi:10.1126/science.1220834 (2012).-   S3 Iyer, M. K., Chinnaiyan, A. M. & Maher, C. A. ChimeraScan: a tool    for identifying chimeric transcription in sequencing data.    Bioinformatics 27, 2903-2904, doi:10.1093/bioinformatics/btr467    (2011).-   S4 Vilella, A. J. et al. EnsemblCompara GeneTrees: Complete,    duplication-aware phylogenetic trees in vertebrates. Genome Res 19,    327-335, doi:10.1101/gr.073585.107 (2009).-   S5 Seal, R. L., Gordon, S. M., Lush, M. J., Wright, M. W. &    Bruford, E. A. genenames.org: the HGNC resources in 2011. Nucleic    Acids Res 39, D514-519, doi:10.1093/nar/gkq892 (2011).-   S6 Soding, J. Protein homology detection by HMM-HMM comparison.    Bioinformatics 21, 951-960, doi:10.1093/bioinformatics/bti125    (2005).-   S7 Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform    for automated protein structure and function prediction. Nat Protoc    5, 725-738, doi:10.1038/nprot.2010.5 (2010).-   S8 Stogios, P. J., Downs, G. S., Jauhal, J. J., Nandra, S. K. &    Prive, G. G. Sequence and structural analysis of BTB domain    proteins. Genome Biol 6, R82, doi:10.1186/gb-2005-6-10-r82 (2005).-   S9 Errington, W. J. et al. Adaptor protein self-assembly drives the    control of a cullin-RING ubiquitin ligase. Structure 20, 1141-1153,    doi:10.1016/j.str.2012.04.009 (2012).-   S10 Zhuang, M. et al. Structures of SPOP-substrate complexes:    insights into molecular architectures of BTB-Cul3 ubiquitin ligases.    Mol Cell 36, 39-50, doi:10.1016/j.molcel.2009.09.022 (2009).-   S11 Canning, P. et al. Structural basis for Cul3 assembly with the    BTB-Kelch family of E3 ubiquitin ligases. J Biol Chem,    doi:10.1074/jbc.M112.437996 (2013).-   S12 Fulop, V. & Jones, D. T. Beta propellers: structural rigidity    and functional diversity. Curr Opin Struct Biol 9, 715-721 (1999).-   S13 Carro, M. S. et al. The transcriptional network for mesenchymal    transformation of brain tumours. Nature 463, 318-325,    doi:10.1038/nature08712 (2010).-   S14 Niola, F. et al. Mesenchymal high-grade glioma is maintained by    the ID-RAP1 axis. J Clin Invest 123, 405-417, doi:10.1172/JCI63811    (2013).-   S15 Zhao, X. et al. The N-Myc-DLL3 cascade is suppressed by the    ubiquitin ligase Huwel to inhibit proliferation and promote    neurogenesis in the developing brain. Dev Cell 17, 210-221,    doi:10.1016/j.devce1.2009.07.009 (2009).-   S16 Friedman, H. S. et al. Experimental chemotherapy of human    medulloblastoma cell lines and transplantable xenografts with    bifunctional alkylating agents. Cancer Res 48, 4189-4195 (1988).-   S17 Srivastava, M. et al. The Amphimedon queenslandica genome and    the evolution of animal complexity. Nature 466, 720-726,    doi:10.1038/nature09201 (2010).-   S18 Sebe-Pedros, A., Roger, A. J., Lang, F. B., King, N. &    Ruiz-Trillo, I. Ancient origin of the integrin-mediated adhesion and    signaling machinery. Proc Natl Acad Sci USA 107, 10142-10147,    doi:10.1073/pnas.1002257107 (2010).

Example 2—Genomic Alterations in Glioblastoma

Glioblastoma remains one of the most challenging forms of cancer totreat. This example discusses a computational platform that integratesthe analysis of copy number variations and somatic mutations andunravels the landscape of in frame gene fusions in glioblastoma.Mutations were found with loss of heterozygosity of LZTR-1, an adaptorof Cul3-containing E3 ligase complexes. Mutations and deletions disruptLZTR-1 function, which restrains self-renewal and growth of gliomaspheres retaining stem cell features. Loss-of-function mutations ofCTNND2 target a neural-specific gene and are associated withtransformation of glioma cells along the very aggressive mesenchymalphenotype. Recurrent translocations are reported that fuse the codingsequence of EGFR to several partners, with EGFR-SEPT14 as the mostfrequent functional gene fusion in human glioblastoma. EGFR-SEPT14fusions activate Stat3 signaling and confer mitogen independency andsensitivity to EGFR inhibition. These results provide important insightsinto the pathogenesis of glioblastoma and highlight new targets fortherapeutic intervention.

Glioblastoma (GBM) is the most common primary intrinsic malignant braintumor affecting ˜10,000 new patients each year with a median survivalrate of 12-15 months^(1,2). Identifying and understanding the functionalsignificance of genetic alterations that drive initiation andprogression of GBM is crucial to develop effective therapies. Previousefforts in GBM genome characterization identified somatic changes inwell-known GBM genes (EGFR, PTEN, IDH1, TP53, NF1, etc.) and nominatedputative cancer genes with somatic mutations, but the functionalconsequence of most alterations is unknown³⁻⁶. Furthermore, theabundance of passenger mutations and large regions of copy numbervariations (CNVs) complicates the definition of the landscape of drivermutations in glioblastoma. To address this challenge, a statisticalapproach was used to nominate driver genes in GBM by integrating somaticmutations identified by whole-exome sequencing with a CNVs analysis thatprioritizes focality and magnitude of the genetic alterations.

Recurrent and oncogenic gene fusions are hallmarks of hematologicalmalignancies and have also been uncovered in solid tumors^(7,8).Recently, a small subset of GBM harbor FGFR-TACC gene fusions wasreported indicating that the patients with FGFR-TACC-positive tumorswould benefit from targeted FGFR kinase inhibition⁹. It remains unknownwhether gene fusions involving other RTK-coding genes exist and produceoncogene addiction in GBM. Here, a large RNA-sequencing dataset ofprimary GBM and Glioma Sphere Cultures (GSCs) is investigated and theglobal landscape of in frame gene fusions in human GBM are reported.

Nomination of Candidate GBM Genes

Without being bound by theory, integration of somatic point mutationsand focal CNVs will uncover candidate driver GBM genes. MutComFocal isan algorithm designed to rank genes by an integrated recurrence,focality and mutation score (see Methods). This strategy was applied to139 GBM and matched normal DNA analyzed by whole-exome sequencing toidentify somatic mutations and 469 GBM analyzed by the Affymetrix SNP6.0platform to identify CNVs.

The whole-exome analysis revealed a mean of 43 nonsynonymous somaticmutations per tumor sample. The distribution of substitutions shows ahigher rate of transitions versus tranversions (67%), with a strongpreference for C->T and G->A (55%) (FIG. 6 ). As seen in other tumortypes¹⁰, 19.2% of the mutations occurred in a CpG dinucleotide context(FIG. 7 ). Among somatic small nucleotide variants, the most frequentlymutated genes have roles in cancer, including GBM (TP53, EGFR, PTEN, andIDH1). In addition to known cancer genes, new candidate driver geneswere mutated in ˜5% of tumor samples. By integrating mutational andcommon focal genomic lesions, MutComFocal stratified somatically mutatedgenes into three groups: recurrently mutated genes without significantcopy number alterations (Mut), in regions of focal and recurrentamplifications (Amp-Mut) and in regions of focal and recurrent deletions(Del-Mut).

A list of 67 genes were generated that score at the top of each of thethree categories and included nearly all the genes previously implicatedin GBM. Among these genes, (labeled in light grey in FIG. 1 ) are IDH1(Mut, FIG. 1 a ), PIK3C2B, MDM4, MYCN, PIK3CA, PDGFRA, KIT, EGFR, andBRAF (Amp-Mut, FIG. 1B) and PIK3R1, PTEN, RB1, TP53, NF1 and ATRX(Del-Mut, FIG. 1 c ). The analysis also selected 52 new candidate drivergenes previously unreported in GBM. Based upon their role in CNSdevelopment and homeostasis as well as their potential function ingliomagenesis, 24 genes were selected for re-sequencing in anindependent dataset of 83 GBM and matched normal controls. Eighteengenes were found somatically mutated by Sanger sequencing in theindependent panel (labeled in dark grey in FIG. 1 ). Each validated newGBM gene is targeted by somatic mutations and CNVs in a cumulativefraction comprised between 2.9% and 45.7% of GBM. Furthermore, mutationsof the 18 new GBM genes occur mostly in tumors with global mutationrates similar to the mean of 43 mutations per tumor and well within the95% confidence interval, indicating that mutations of the 18 new genesdo not cluster in hypermutated tumors (FIG. 2C and FIG. 9 ).

Among the commonly mutated and focally deleted genes exhibiting topMutComFocal scores and validated in the independent GBM dataset, BCOR,LRP family members, HERC2, LZTR-1 and CTNND2. BCOR, an X-linked gene,encodes for a component of the nuclear co-repressor complex that isessential for normal development of neuroectoderm and stem cellfunctions¹¹⁻¹³. BCOR mutations have recently been described inretinoblastoma and medulloblastoma^(14,15). LRP1B, a member of the LDLreceptor family, is among the most frequently mutated genes in humancancer (FIG. 1 c )¹⁶. Interestingly, two other LDL receptor familymembers (LRP2 and LRP1) are mutated in 4.4% and 2.9% of tumors,respectively (FIG. 1 a ). The LRP proteins are highly expressed in theneuroepithelium and are essential for forebrain morphogenesis in mouseand humans^(17,18). The tumor suppressor function of LRP proteins in GBMmay relate to the ability to promote chemosensitivity and control in theSonic hedgehog signaling pathway, which is implicated in cancerinitiating cells in GBM¹⁹⁻²¹. Localized on chromosome 15q13, the Hectubiquitin ligase Herc2 gene is deleted and mutated in 15.1% and 2.2% ofGBM cases, respectively. Herc2 has been implicated in severeneurodevelopmental syndromes and Herc2 substrates regulate genomestability and DNA damage-repair^(22,3).

LZTR-1 Mutations Inactivate a Cullin-3 Adaptor to Drive Self-Renewal andGrowth of Glioma Spheres

A gene that received one of the highest Del-Mut score by MutComFocal isLZTR-1 (FIG. 1 c ). The LZTR-1 coding region had non-synonymousmutations in 4.4%, and the LZTR-1 locus (human chromosome 22q11) wasdeleted in 22.4% of GBM. Among the 18 new GBM genes, LZTR-1 had thehighest co-occurrence score of mutations and deletions (Fisher's exacttest, p=0.0007). It also scored at the top of the list of genes whoseCNVs are statistically correlated with expression (Pearson correlationbetween LZTR-1 CNVs and expression is 0.36, p-value<10⁻⁶ by Student'st-distribution). Finally, LZTR-1 emerged as the gene with the highestcorrelation for monoallelic expression of mutant alleles in tumorsharboring LZTR-1 deletions (p-value=0.0007). Taken together, thesefindings indicate that LZTR-1 is concurrently targeted in GBM bymutations and copy number loss, fulfilling the two-hits model for tumorsuppressor inactivation in cancer.

LZTR-1 codes for a protein with a characteristic Kelch-BTB-BACK-BTB-BACKdomain architecture (FIGS. 2C, 8, 9 ) and is expressed in normal brain.The LZTR-1 gene is highly conserved in metazoans. Although it wasinitially proposed that LZTR-1 functions as a transcriptional regulator,this role was not confirmed in follow-up studies²⁴. Most proteins withBTB-BACK domains are substrate adaptors in Cullin-3 (Cul3) ubiquitinligase complexes, in which the BTB-BACK region binds to the N-terminaldomain of Cul3, while a ligand binding domain, often a Kelch 6-bladedβ-propeller motif, binds to substrates targeted for ubiquitylation²⁵. Toask whether LZTR-1 directly binds Cul3, co-immunoprecipitationexperiments were performed in human glioma cells. FIG. 15 shows thatCul3 immunoprecipitates contain LZTR-1, indicating that LZTR-1 is anadaptor in Cul3 ubiquitin ligase complexes.

To address the function of LZTR-1 mutants, a homology model of LZTR-1was built based partly on the crystal structures of the MATH-BTB-BACKprotein SPOP²⁶, the BTB-BACK-Kelch proteins KLHL3²⁷ and KLHL11²⁸, andthe Kelch domain of Keap1²⁹ (FIG. 2 b ). Without being bound by theory,the second BTB-BACK region of LZTR-1 binds Cul3 because of a ϕ-X-E motifin this BTB domain, followed by a 3-Box/BACK region (FIG. 9 )²⁶.However, the preceding BTB-BACK region can also participate in Cul3binding. Five of seven LZTR-1 mutations identified in GBM are locatedwithin the Kelch domain and target highly conserved amino acids (FIG. 2b , FIG. 2C, FIG. 8 ). Interestingly, the concentration of LZTR-1mutations in the Kelch domain reflects a similar pattern of mutations inthe Kelch-coding region of KLHL3, recently identified in families withhypertension and electrolytic abnormalities^(30,31).

The R198G and G248R mutations localize to the b-c loop of the Kelchdomain, in a region predicted to provide the substrate-bindingsurface²⁹. The W105R mutation targets a highly conserved anchor residuein the Kelch repeats and the T2881 mutation disrupts a buried residueconserved in LZTR-1 (FIG. 2 b , FIG. 2C, FIG. 8 ). Both mutations areexpected to perturb folding of the Kelch domain. The E353STOP mutationis expected to produce a misfolded Kelch domain besides removing theC-terminal BTB-BACK regions. Located in the BTB-BACK domains, theremaining two mutations either truncate the entire BTB-BACK-BTB-BACKregion (W437STOP) or are predicted to disrupt the folding of the lasthelical hairpin in the BTB-BACK domain (R810W, FIG. 2 b ).

To ask whether the mutations predicted to affect the BTB-BACK domainsperturb the interaction with Cul3, in vitro translated wild type,E353STOP, W437STOP and R810W LZTR-1 Myc-tagged proteins were preparedand their ability to bind to Flag-Cul3 purified from mammalian cells wastested. Wild type LZTR-1 bound Flag-Cul3, but the E353STOP and W437STOPmutants lost this property. However, the R810W mutant retained Cul3binding in this assay (FIG. 16A). Besides promoting ubiquitin-mediateddegradation of substrates, Cullin adaptors are short-lived proteins thatundergo auto-ubiquitylation and destruction by the same Cullin complexesthat direct substrate ubiquitylation³²⁻³⁴. Thus, impaired ubiquitinligase activity of the LZTR-1-Cul3 complex should result in accumulationof mutant LZTR-1 proteins. Each of the three LZTR-1 mutants predicted tocompromise integrity of the BTB-BACK domains accumulated at higherlevels than wild-type LZTR-1 in transient transfection assays (FIG.16B). The steady state and half-life of the LZTR-1 R810W mutant proteinwere markedly increased, in the absence of changes of the mutant mRNA(FIG. 16C-D). Thus, as for the two truncated mutants, the R810W mutationcompromised protein degradation.

Next, the biological consequences of LZTR-1 inactivation in human GBM.Differential gene expression pattern of GBM harboring mutations wasexamined and deletions of LZTR-1 or normal LZTR-1 revealed that tumorswith genetic inactivation of LZTR-1 were enriched for genes associatedwith glioma sphere growth and proliferation³⁵ (FIG. 17A). Introductionof LZTR-1 in three independent GBM-derived sphere cultures resulted instrong inhibition of glioma sphere formation and expression of gliomastem cell markers (FIG. 17B-E). LZTR-1 also decreased the size of tumorspheres, induced a flat and adherent phenotype and reduced proteinsassociated with cell cycle progression (cyclin A, PLK1, p107, FIG.17D-E). Interestingly, both R810W and W437STOP LZTR-1 mutationsabolished LZTR-1 ability to impair glioma sphere formation (FIG. 17F).The above experiments indicate that LZTR-1 inactivation in human GBMdrives self-renewal and growth of glioma spheres.

Inactivation of CTNND2 Induces Mesenchymal Transformation inGlioblastoma

Among the top ranking genes in MutComFocal, CTNND2 is expressed at thehighest levels in normal brain. CTNND2 codes for δ-catenin, a member ofthe p120 subfamily of catenins expressed almost exclusively in thenervous system where it is crucial for neurite elongation, dendriticmorphogenesis and synaptic plasticity³⁶⁻³⁸. Germ-line hemizygous loss ofCTNND2 impairs cognitive functions and underlies some forms of mentalretardation^(39,40). CTNND2 shows pronounced clustering of mutations inGBM. The observed spectrum of mutations includes four mutations in thearmadillo-coding domain and one in the region coding for the N-terminalcoiled-coil domain (FIG. 10A), the two most relevant functional domainsof δ-catenin. Each mutation targets highly conserved residues withprobably (K629Q, A776T, S881L, D999E) and possibly (A71T) damagingconsequences⁴¹. GBM harbors focal genomic losses of CTNND2, anddeletions correlate with loss of CTNND2 expression (FIG. 10B).

Immunostaining experiments showed that δ-catenin is strongly expressedin normal brain, particularly in neurons, as demonstrated by co-stainingwith the neuronal markers β3-tubulin and MAP2 but not the astrocyticmarker GFAP (FIG. 18A-B). Conversely, immunostaining of 69 GBM andwestern blot of 9 glioma sphere cultures revealed negligible or absentexpression of δ-catenin in 21 tumors and in most glioma sphere cultures(FIG. 22 ). Oncogenic transformation in the CNS frequently disrupts thedefault proneural cell fate and induces an aberrant mesenchymalphenotype associated with aggressive clinical outcome⁴². Gene expressionanalysis of 498 GBM from ATLAS-TCGA showed that low CTNND2 expression isstrongly enriched in tumors exhibiting the mesenchymal gene expressionsignature (t-test p-value=2.4 10⁻¹², FIG. 10D). Tumors with reducedCTNND2 were characterized by poor clinical outcome and, among them,tumors with CTNND2 copy number loss displayed the worst prognosis (FIG.3C-D). Patients with low CTNND2 expression showed the worst clinicaloutcome in mesenchymal GBM, though non-mesenchymal tumors alsodemonstrated poor prognosis, albeit with reduced strength (FIG. 3D).

Mesenchymal transformation of GBM is associated with irreversible lossof proneural cell fate and neuronal markers⁴² and is detected in mostestablished glioma cell lines. Expression of δcatenin in the U87 humanglioma cell line reduced cell proliferation (FIG. 3E), elevatedexpression of neuronal proteins βIII-tubulin, PSD95 (a post-synapticmarker) and N-cadherin (FIG. 3G, FIG. 23A) and decreased mRNA andprotein levels of mesenchymal markers (FIG. 3F, FIG. 18C, FIG. 23A).These effects were associated with morphologic changes characterized byneurite extension and development of branched dendritic processes (FIG.3F, FIG. 23B-23C). Conversely, expression of the A776T, K629Q and D999Emutants of CTNND2 failed to induce neuronal features and down-regulatethe mesenchymal marker fibronectin (FBN, FIG. 18C, FIG. 23B-23C).Consistent with δ-catenin inhibition of cell proliferation in gliomacells, only wild type δ-catenin decreased cyclin A, a S-phase cyclin(FIG. 18C).

Next, the effect of expressing δ-catenin in GBM-derived sphere culture#48 that lacks the endogenous δ-catenin protein (FIG. 22B) and expresseshigh levels of mesenchymal markers was analyzed 43 Introduction ofδ-catenin in sphere culture #48 strongly reduced mesenchymal proteinssmooth muscle actin (SMA), collagen-5A1 (Col5A1) and FBN, as measured byquantitative immunofluorescence (FIGS. 19A-B). It also inducedβIII-tubulin more than eight-fold (FIGS. 19C-D). Time course analysisshowed the highest degree of βIII-tubulin-positive neurite extension at4-6 days post-transduction followed by progressive depletion ofneuronal-like cells from culture (FIG. 19D). Finally, whether δ-cateninimpacts self-renewal and growth of glioma spheres in vitro and theirability to grow as tumor masses in vivo were examined. In a limitingdilution assay, δ-catenin inhibited glioma sphere formation more than8-fold (FIG. 19E). To determine the effect of δ-catenin on braintumorigenesis in vivo, #48 glioma sphere cultures were generatedexpressing luciferase and bioluminescence imaging was conducted atdifferent times after stereotactic transduction of control andδ-catenin-expressing cells in the mouse brain. When compared tocontrols, a 5-fold inhibition of tumor growth by δ-catenin at each timepoint analyzed (FIG. 19F, FIG. 23D). These results identify CTNND2inactivation as a key genetic alteration driving the aggressivemesenchymal phenotype of GBM.

Recurrent EGFR Fusions in GBM

To identify gene fusions in GBM, RNA-seq data was analyzed from a totalof 185 GBM samples (161 primary GBM plus 24 short-term glioma spherecultures freshly isolated from patients carrying primary GBM). Theanalysis of RNA-seq led to the discovery of 92 candidate rearrangementsgiving rise to in-frame fusion transcripts (FIG. 27 ). Besidespreviously reported FGFR3-TACC3 fusions events, the most frequentrecurrent in-frame fusions involved EGFR in 7.6% of samples (14/185,3.8%-11.3% CI). Nine of 14 EGFR fusions included recurrent partnersSEPT14 (6/185, 3.2%) and PSPH (3/185, 1.6%) as the 3′ gene segment inthe fusion. All EGFR-SEPT14 and two of three EGFR-PSPH gene fusionsoccurred within amplified regions of the fusion genes (FIG. 24 ).

The quantitative analysis of expressed reads spanning the fusionbreakpoint versus reads spanning EGFR exons not implicated in the fusiontranscripts revealed that EGFR fusion genes were expressed at higherlevels in five of nine tumors (FIG. 30 ). Two in-frame highly expressedfusions involving the neurotrophic tyrosine kinase receptor 1 gene(NTRK1) as the 3′ gene with two different 5′ partners (NFASC-NTRK1 andBCAN-NTRK1). Fusions involving NTRK1 are common in papillary thyroidcarcinomas⁴⁴. Using EXomeFuse, an algorithm that reconstructs genomicfusions from whole-exome data, EGFR-SEPT14 and NRTK1 fusions result fromrecurrent chromosomal translocations and the corresponding genomicbreakpoints were reconstructed (FIG. 31 ).

The sequence of the PCR products spanning the fusion breakpointvalidated all three types of recurrent in frame fusion predictions(EGFR-SEPT14, EGFR-PSPH and NRTK1 fusions, FIGS. 4, 11, 12 ). In FIGS.4A-B, the prediction and cDNA sequence validation is shown respectively,for one tumor harboring an EGFR-SEPT14 fusion (TCGA-27-1837). Theamplified cDNA contained an open reading frame for a 1,041 amino-acidprotein resulting from the fusion of EGFR residues 1-982 with SEPT14residues 373-432 (FIG. 4C). Thus, the structure of EGFR-Septin14 fusionsinvolves EGFR at the N-terminus, providing a receptor tyrosine kinasedomain fused to a coiled-coil domain from Septin14. Exon-specificRNA-seq expression in TCGA-27-1837 demonstrated that EGFR and SEPT14exons implicated in the fusion are highly expressed compared with mRNAsequences not included in the fusion event (FIG. 13 ).

Using PCR, the genomic breakpoint was mapped to chromosome 7(#55,268,937 for EGFR and #55,870,909 for SEPT14, genome buildGRCh37/hg19) within EGFR exon 25 and SEPT14 intron 9, creating atranscript in which the 5′ EGFR exon 24 is spliced to the 3′ SEPT14 exon10 (FIG. 4D). Interestingly, the fused EGFR-PSPH cDNA and predictedfusion protein in sample TCGA-06-5408 involves the same EGFR N-terminalregion implicated in the EGFR-SEPT14 with PSPH providing acarboxy-terminal portion of 35 amino acids (FIG. 11 ). An example of afusion in which the EGFR-TK region is the 3′ partner is the CAND1-EGFRfusion in the glioma sphere culture #16 (FIG. 14 ). Each fusiontranscript includes the region of the EGFR mRNA coding for the TK domain(FIG. 27 ). RT-PCR and genomic PCR followed by Sanger sequencing of GBMTCGA-06-5411 validated the NFASC-NTRK1 fusions in which the predictedfusion protein includes the TK domain of the high-affinity NGF receptor(TrkA) fused downstream to the immunoglobulin-like region of the celladhesion and ankyrin-binding region of neurofascin (FIG. 12 ).

To confirm that GBM harbors recurrent EGFR fusions and determine thefrequency in an independent dataset, cDNA was screened from a panel of248 GBMs and discovered 10 additional cases with EGFR-SEPT14 fusions(4%). Conversely, NFASC-NTRK1 fusions were not detected in this dataset.A 2.2% (3/135) frequency of EGFR-PSPH fusions was determined.

The discovery of recurrent EGFR fusions in GBM is of particularinterest. EGFR is activated in a significant fraction of primary GBM(˜25%) by an in-frame deletion of exons 2-7 (EGFRvIII)⁴⁵. However, sevenof nine tumors harboring EGFR-SEPT14 and EGFR-PSPH gene fusions lackedthe EGFRvIII rearrangement (FIG. 32 ). It was determined whether themost frequent EGFR fusion in GBM (EGFR-SEPT14) provides an alternativemechanism of EGFR activation and confers sensitivity to EGFR inhibition.First, whether EGFR gene fusions cluster into any gene expressionsubtype of GBM (proneural, neural, classical, mesenchymal) wasinvestigated. Although no individual subtype displayed a statisticallysignificant enrichment of EGFR fusions, 8 of 9 GBM harboring EGFR-SEPT14or EGFR-PSPH belonged to the classical or mesenchymal subtype (Fisher'sP value=0.05 for classical/mesenchymal enrichment, FIG. 33 ).

Next, the effects of ectopic EGFR-SEPT14, EGFRvIII or EGFR wild type onglioma cells were investigated. Lentiviral transduction of #48 humanglioma sphere culture (which lacks genomic alteration of EGFR) showedthat cells expressing EGFR-SEPT14 or EGFRvIII but not those expressingwild type EGFR or vector retained growth and self-renewal in the absenceof EGF and bFGF (FIG. 20A). Accordingly, established glioma cell linesexpressing EGFR-SEPT14 or EGFRvIII proliferated at higher rate thancontrol cells or cells expressing wild type EGFR (FIG. 5A, FIG. 25 ).Furthermore, EGFR-SEPT14 and EGFRvIII markedly enhanced migration ofglioma cells in a wound assay (FIG. 5B-C). The above findings indicatethat EGFR-SEPT14 might constitutively activate signaling eventsdownstream of EGFR. When analyzed in the presence and absence ofmitogens, the expression of EGFR-SEPT14 (or EGFRvIII) in glioma spherecultures #48 triggered constitutive activation of phospho-STAT3 but hadno effects on phospho-ERK and phospho-AKT (FIG. 20B-C). This isconsistent with enrichment of STAT3-target genes in primary human GBMharboring EGFR-SEPT14 fusions compared with tumors carrying wild typeEGFR (FIG. 20D). Differential gene expression analysis identified a setof 9 genes up-regulated in EGFR-SEPT14 tumors compared withEGFRvIII-positive GBM (FIG. 26 ). These genes broadly relate toinflammatory/immune response, and some code for chemokines (CXCL9, 10,11) that have been associated with aggressive glioma phenotypes⁴⁶.

Finally, it was investigated whether EGFR-SEPT14 fusions confersensitivity to inhibition of EGFR-TK. Treatment of #48 expressingEGFR-SEPT14, EGFRvIII, wild type EGFR or vector control with lapatinib,an irreversible EGFR inhibitor recently proposed to target EGFRalterations in GBM′, revealed that EGFR-Sept14 and EGFRvIII but notwild-type EGFR sensitized glioma cells to pharmaceutical EGFR inhibition(FIG. 20E). Similar effects were obtained following treatment of#48-derivatives with erlotinib, another inhibitor of EGFR-TK (FIG. 5D).

To ask whether sensitivity to EGFR-TK inhibition is retained in humanglioma cells naturally harboring EGFR-SEPT14 in vivo, anEGFR-SEPT14-positive GBM xenograft (D08-0537 MG) established from aheavily pretreated patient was used. Treatment of D08-0537 MG tumorswith lapatinib or erlotinib showed that both drugs significantly delayedtumor growth, with lapatinib displaying the strongest anti-tumoreffects. Conversely, EGFR inhibitors were ineffective against GBMxenograft D08-0714 MG, which lacks EGFR genomic alterations (FIG. 5E).Taken together, these data determine that EGFR-SEPT14 fusions confermitogen-independent growth, constitutively activate STAT3 signaling andimpart sensitivity to EGFR kinase inhibition to glioma cells harboringthe fusion gene.

Discussion

A computational pipeline was described that computes frequency,magnitude and focality of CNVs at any loci in the human genome with thesomatic mutation rate for genes residing at that genomic location, thusintegrating into a single score two genetic hallmarks of driver cancergenes (focality of CNVs and point mutations). Besides recognizing nearlyall genes known to have functional relevance in GBM, this studydiscovered and validated somatic mutations in 18 new genes, which alsoharbor focal and recurrent CNVs in a significant fraction of GBM. Theimportance of some of these genes extends beyond GBM, as underscored bycross-tumor relevance (e.g. BCOR), and protein family recurrence (e.g.LRP family members).

Also, the LZTR-1 mutations targeting highly conserved residues in theKelch domain (W105, G248, T288) and in the second BTB-BACK domain (R810)are recurrent events in other tumor types⁴⁸. Thus, understanding thenature of substrates of LZTR-1-Cul3 ubiquitin ligase activity willprovide important insights into the pathogenesis of multiple cancertypes. The importance of LZTR-1 genetic alterations in GBM isunderscored by concurrent targeting of LZTR-1 by mutations and deletionsthat supports a two-hits mechanism of tumor suppressor gene inactivationas well as the impact of mutations targeting the BTB-BACK domains onCul3 binding and/or protein stability, and their ability to releaseglioma cells from the restraining activity of the wild-type protein onself-renewal.

The finding that loss-of-function of CTNND2 cluster in mesenchymal GBMprovides a clue to the genetic events driving this aggressive GBMsubtype. The function of δ-catenin for crucial neuronal morphogenesisindicates that full-blown mesenchymal transformation in the brainrequires loss of master regulators constraining cell determination alongthe neuronal lineage. Introduction of δ-catenin in human glioma spherescollapsed the mesenchymal phenotype and inhibited sphere formation andtumor growth. Thus, the ability of δ-catenin to reprogram glioma cellsexpressing mesenchymal genes towards a neuronal fate unravels anunexpected plasticity of mesenchymal GBM that might be exploitedtherapeutically.

In this study, the landscape of gene fusions from a large dataset of GBManalyzed by RNA-Sequencing is also reported. In-frame gene fusionsretaining the RTK-coding domain of EGFR emerged as the most frequentgene fusion in GBM. In this tumor, EGFR is frequently targeted by focalamplications and our finding underscores the strong recombinogenicprobability of focally amplified genes, as recently reported for the myclocus in medulloblastoma⁴⁹. Resembling intragenic rearrangements thatgenerate the EGFRvIII allele, EGFR-SEPT14 fusions impart to glioma cellsthe ability to self-renew and grow in the absence of mitogens,constitutively activate STAT3 signaling, and confer sensitivity to EGFRinhibition. These findings highlight the relevance of fusionsimplicating RTK-coding genes in the pathogenesis of GBM⁹. They alsoprovide a strong rationale for the inclusion of GBM patients harboringEGFR fusions in clinical trials based on EGFR inhibitors.

Methods

SAVI (statistical algorithm for variant frequency identification). Thefrequencies of variant alleles were estimated in 139 paired tumor andnormal whole-exome samples from TCGA using the SAVI pipelines⁵⁰. Thealgorithm estimates the frequency of variant alleles by constructing anempirical Bayesian prior for those frequencies, using data from thewhole sample, and obtains a posterior distribution and high credibilityintervals for each alleles⁵⁰. The prior and posterior are distributedover a discrete set of frequencies with a precision of 1% and areconnected by a modified binomial likelihood, which allows for some errorrate. More precisely, a prior distribution p(f) of the frequency f and aprior for the error e uniform on the interval [0,E] for a fixed 0≤E≤1 isassumed. The sequencing data at a particular allele is a randomexperiment producing a string of m (the total depth at the allele) bitswith n ‘1’s (the variant depth at the allele). Assuming a binomiallikelihood of the data and allowing for bits being misread because ofrandom errors, the posterior probability P(f) of the frequency f is

${P(f)} = {{\frac{p(f)}{C} \cdot \frac{1}{b - a}}{\overset{f + E - {2{Ef}}}{\int\limits_{f}}{{x^{n}\left( {1 - x} \right)}^{m - n}{dx}}}}$

-   -   where C is a normalization constant. For a particular allele,        the value of E is determined by the quality of the nucleotides        sequenced at that position as specified by their Phred scores.        The SAVI pipeline takes as input the reads produced by the        sequencing technology, filters out low-quality reads and maps        the rest onto a human reference genome. After mapping, a        Bayesian prior for the distribution of allele frequencies for        each sample is constructed by an iterative posterior update        procedure starting with a uniform prior. To genotype the sample,        the posterior high-credibility intervals were used for the        frequency of the alleles at each genomic location.        Alternatively, combining the Bayesian priors from different        samples, posterior high-credibility intervals were obtained for        the difference between the samples of the frequencies of each        allele. Finally, the statistically significant differences        between the tumor and normal samples are reported as somatic        variants. To estimate the positive prediction value of SAVI in        the TCGA GBM samples, 41 mutations were selected for independent        validation by Sanger sequencing. 39 of the 41 mutations using        Sanger sequencing were confirmed, resulting in a 0.95 (95% CI        0.83-0.99) validation rate.

Candidate genes were ranked by the number of somatic nonsynonymousmutations. A robust fit of the ratio of nonsynonymous to synonymousmutations was generated with a bisquare weighting function. The excessof nonsynonymous alterations was estimated using a Poisson distributionwith a mean equal to the product of the ratio from the robust fit andthe number of synonymous mutations. Genes in highly polymorphic genomicregions were filtered out based on an independent cohort of normalsamples. The list of these regions includes families of genes known togenerate false positives in somatic predictions (for example, the HLA,KRT and OR gene families).

MutComFocal. Key cancer genes are often amplified or deleted inchromosomal regions containing many other genes. Point mutations andgene fusions, conversely, provide more specific information about whichgenes may be implicated in the oncogenic process. MutComFocal wasdeveloped, a Bayesian approach that assigns a driver score to each geneby integrating point mutations and CNV data from 469 GBMs (AffymetrixSNP6.0). In general, MutComFocal uses three different strategies. First,the focality component of the score is inversely proportional to thesize of the genomic lesion to which a gene belongs and thus prioritizesmore focal genomic lesions. Second, the recurrence component of theMutComFocal score is inversely proportional to the total number of genesaltered in a sample, which prioritizes samples with a smaller number ofaltered genes. Third, the mutation component of the score is inverselyproportional to the total number of genes mutated in a sample, whichachieves the twofold goal of prioritizing mutated genes on one hand andprioritizing samples with a smaller number of mutations on the other.

More specifically, for a particular sample, let (c₁,N₁), . . . ,(c_(k),N_(k)) describe the amplification lesions in that sample so thatN_(i) is the number of genes in the ith lesion and c_(i) is its copynumber change from normal. For a gene belonging to the ith lesion, theamplification recurrence sample score is defined as (c₁,N₁), . . . ,(c_(k),N_(k)), and its amplification focality sample score is defined as(c_(i)/Σc_(j))×(1/N_(i)). To obtain the amplification recurrence andfocality scores for a particular gene, the corresponding sample scoreswere summed over all the samples and the result was normalized so thateach score sums to 1. The deletion and recurrence scores are defined ina similar manner. The mutation score is analogous to a recurrence scorein which it is assumed that mutated genes belong to lesions with onlyone gene.

The amplification/mutation score is defined as the product of the twoamplification scores and the mutation score, whereas thedeletion/mutation score is defined as the product of the two deletionscores and the mutation score. The amplification/mutation anddeletion/mutation scores are normalized to 1, and for each score, genesare divided into tiers iteratively so that the top 2^(X) remaining genesare included in the next tier, where H is the entropy of the scores ofthe remaining genes normalized to 1. On the basis of their tier acrossthe different types of scores, genes are assigned to being eitherdeleted/mutated or amplified/mutated, and genes in the top tiers aregrouped into contiguous regions. The top genes in each region areconsidered manually and selected for further functional validation.

The recurrence and focality scores can be interpreted as the posteriorprobabilities that a gene is driving the selection of the disease undertwo different priors, one global and one local in nature. The recurrencescore is higher if a gene participates in many samples that do not havetoo many altered genes, whereas the focality score is higher if the geneparticipates in many focal lesions. Besides lending strong support tothe inference of a gene as a potential driver, the directionality of thecopy number alteration (amplification or deletion) informs the probablebehavior of the candidate gene as an oncogene or tumor suppressor,respectively.

The genes displayed in FIG. 1 were selected on the basis of theMutComFocal ranking (top 250 genes), the size of the minimal region(less than 10 genes) and the frequency of mutations (more than 2% fordeletion/mutations and at least 1% for amplification/mutations).

RNA-seq bioinformatics analysis. 161 RNA-seq GBM tumor samples wereanalyzed from TCGA, a public repository containing large-scale genomesequencing of different cancers, plus 24 patient-derived GSCs. Nine GSCsamples reported in previous studies were kept in our analysis toevaluate recurrence⁹. The samples were analyzed using the ChimeraScan⁵¹algorithm to detect a list of gene fusion candidates. Briefly,ChimeraScan detects those reads that discordantly align to differenttranscripts of the same reference (split inserts). These reads providean initial set of putative fusion candidates. The algorithm thenrealigns the initially unmapped reads to the putative fusion candidatesand detects those reads that align across the junction boundary (splitreads). These reads provide the genomic coordinates of the breakpoint.

RNA-seq analysis detected a total of 39,329 putative gene fusion events.To focus the experimental analysis on biologically relevant fusedtranscripts, the Pegasus annotation pipeline(http://sourceforge.net/projects/pegasus-fus/) was applied. For eachputative fusion, Pegasus reconstructs the entire fusion sequence on thebasis of the genomic fusion breakpoint coordinates and gene annotations.Pegasus also annotates the reading frame of the resulting fusionsequences as either in frame or a frame shift. Moreover, Pegasus detectsthe protein domains that are either conserved or lost in the newchimeric event by predicting the amino acid sequence and automaticallyquerying the UniProt web service. On the basis of the Pegasus annotationreport, relevant gene fusions were selected for further experimentalvalidation according to the reading frame and the conserved and lostdomains. The selected list was based on in-frame events expressed by tenor more reads, with at least one read spanning the breaking point. Tofilter out candidate trans-splicing events, events with putativebreakpoints at a distance of at least 25 kb were pursued.

Identification of genetic rearrangements using whole-exome data.Although whole-exome sequencing data contain low intronic coverage thatreduces the sensitivity for fusion discovery, they are readily availablethrough the TCGA database. To characterize the genomic breakpoint of thechromosomal rearrangement, EXome-Fuse, a new gene fusion discoverypipeline that is designed particularly to analyze whole-exome data, wasdesigned. For the samples harboring EGFR-SEPT14, EGFR-PSPH, NFASC-NTRK1and BCAN-NTRK1 fusions in RNA, EXome-Fuse was applied to thecorresponding whole-exome sequencing data deposited in TCGA. Thisalgorithm can be divided into three stages: split-insert identification,split-read identification and virtual reference alignment. Mappingagainst the human genome reference hg18 with BWA, all split inserts arefirst identified to compile a preliminary list of fusion candidates.This list was pruned of any false positives produced from paralogousgene pairs using the Duplicated Genes Database and the EnsemblComparaGeneTrees⁵². Pseudogenes in the candidate list were annotated using thelist from the HUGO Gene Nomenclature Committee (HGNC) database⁵³ andwere given lower priority. Candidates were also filtered out betweenhomologous genes, as well as those with homologous or low-complexityregions around the breakpoint. For the remaining fusion candidates, anysupporting split reads were probed for and their mates using BLAST witha word size of 16, identity cutoff of 90% and an expectation cutoff of10⁻⁴. A virtual reference was created for each fusion transcript and allreads were realigned to calculate a final tally of split inserts andsplit reads such that all aligning read pairs maintain forward-reversedirectionality.

Targeted exon sequencing. All protein-coding exons for the 24 genes ofinterest were sequenced using genomic DNA extracted from frozen tumorsand matched blood. Five-hundred nanograms of DNA from each sample weresheared to an average size of 150 bp in a Covaris instrument for 360 s(duty cycle, 10%; intensity, 5; cycles per burst, 200). Bar-codedlibraries were prepared using the Kapa High-Throughput LibraryPreparation Kit Standard (Kapa Biosystems). Libraries were amplifiedusing the KAPA HiFi Library Amplification kit (Kapa Biosystems) (eightcycles). Libraries were quantified using Qubit Fluorimetric Quantitation(Invitrogen), and the quality and size was assessed using an AgilentBioanalyzer. An equimolar pool of the four bar-coded libraries (300 ngeach) was created, and 1,200 ng was input to exon capture using onereaction tube of the custom Nimblegen SeqCap EZ (Roche) with customprobes targeting the coding exons of the 38 genes. Capture byhybridization was performed according to the manufacturer's protocolswith the following modifications: 1 nmol of a pool of blockeroligonucleotides (complementary to the bar-coded adapters) was used, andpost-capture PCR amplification was done using the KAPA HiFi LibraryAmplification kit, instead of the Phusion High-Fidelity PCR Master Mixwith HF Buffer Kit, in a 60 μl volume, as the Kapa HiFi kit greatlyreduced or eliminated the bias against GC-rich regions.

The pooled capture library was quantified by Qubit (Invitrogen) andBioanalyzer (Agilent) and sequenced in on an Illumina MiSeq sequencerusing the 2×150 paired-end cycle protocol. Reads were aligned to thehg19 build of the human genome using BWA with duplicate removal usingSAMtools as implemented by Illumina MiSeq Reporter. Variant detectionwas performed using GATK UnifiedGenotyper. Somatic mutations wereidentified for paired samples using SomaticSniper and filtered forfrequency of less than 3% in normal samples and over 3% in tumorsamples. Variants were annotated with the Charity annotator to identifyprotein-coding changes and cross referenced against known dbSNP, 1000Genomes and COSMIC variants. Sanger sequencing was used to confirm eachmutation from normal and tumor DNA.

Enrichment of amplified and deleted genes for single-nucleotide variants(SNVs). Although MutComFocal combines SNV and CNV data to identify genesdriving oncogenesis, it does not explicitly determine whether amplifiedor deleted genes are enriched for SNVs within the same sample. Deletionsand SNVs of a gene within the same sample might indicate a two-hit modelof a tumor suppressor. Alternatively, amplifications andgain-of-function mutations of an oncogene within the sample mightfurther promote oncogenesis. For each MutComFocal candidate gene, thenumber of TCGA samples was determined with both amplification and SNVs,amplification alone, SNVs alone or neither. The corresponding Fisher's Pvalue was calculated. A similar analysis for deletions was performed.

Correlation between copy number and expression. One method of assessingthe functional relevance of an amplified or deleted gene is to assessthe effect of gene dosage. For each gene nominated by MutComFocal, thePearson's correlation coefficient was calculated between copy number andexpression. The corresponding P values were computed using pairedStudent's t test.

Allele-specific expression of SNVs. For a given gene nominated byMutComFocal, RNA sequencing can determine whether the mutant orwild-type allele is expressed. Toward this end, VCFtools54 was appliedto the TCGA BAM RNA-seq files produced by TopHat, which produces thedepth of reads calling the reference (R) and variant (V) allele. Ameasure of relative expression of the variant allele is then V/(V+R).For each mutation, the binomial P value of observing more than V out ofV+R reads was calculated, assuming that it is equally probable for aread to call the variant or reference. The binomial P values of eachmutation were then pooled using the Stouffer's Z-score method tocalculate the combined P value per gene.

Ruling out passenger mutations in hypermutated samples. To rule out thepossibility that MutComFocal candidates tend to be passenger mutationsin hypermutated samples, the number of mutations was compared in samplesharboring a MutComFocal mutation to the distribution N of the number ofmutations in each TCGA sample. Because the number of TCGA samples waswell above 30, N was assumed to be well approximated by the normaldistribution and calculated the mean, μ, and s.d., σ. For eachMutComFocal mutation, the Z-test was performed and all mutations failedstatistical significance after correction by the Benjamini-Hochbergmethod.

Determining the presence of EGFRvIII transcripts. To determine theprevalence of EGFRvIII transcripts, an in-house script was created tocalculate the number of split inserts and split reads supporting thejunction between EGFR exons 1 and 8. The EGFRvIII isoform was consideredto be expressed if there were more than five split reads or five splitinserts in a sample.

Calculating the relative expression of EGFR fusions compared towild-type EGFR. To determine the functional relevance of EGFR-SEPT14 andEGFR-PSPH fusions, the relative expression was determined between thefusion and wild-type transcripts within each sample on the basis of BAMfiles mapped by TopHat and provided by TCGA. As a proxy for expressionof the transcript, the depth of reads covering either a mutant orwild-type junction was calculated. In particular, the depth of readscovering the fusion breakpoint of EGFR-SEPT14 or EGFR-PSPH wasconsidered to estimate the expression of the fusion transcript. Becauseall EGFR fusions stereotypically involved exon 24 joined to eitherSEPT14 or PSPH, the depth of reads covering the junctions between EGFRexons 25-26, 26-27 and 27-28 to be a specific gauge of wild-type EGFRexpression was assessed.

Enrichment of the classical and mesenchymal subtype among samples withEGFR fusions. To assess whether samples with EGFR fusions tended tooccur in a particular GBM subtype, each TCGA GBM sample was firstclassified by expression according to the methods of Verhaak et al.⁶.The number of classical, mesenchymal, proneural and neural samples wasthen tallied with and without EGFR gene fusions. The combined class ofclassical and mesenchymal phenotype was enriched for EGFR fusionsaccording to the Fisher's exact test.

Copy number variation in EGFR fusions. Gene fusions often arise fromgenomic instability. Motivated by this observation, segmented SNP arraydata was downloaded from TCGA and calculated the log 2 ratio between thetumor and normal copy numbers. This was plotted along the chromosomalneighborhood of EGFR, SEPT14 and PSPH (chr7:55,000,000-56,500,000).

GSEA. To determine the biological impact of LZTR1 mutations, GSEA⁵⁵ wasused, which is an analytical tool that harnesses expression data tonominate gene sets enriched for a particular phenotype. Havingidentified TCGA samples with LZTR1 SNVs, GSEA was applied to the TCGAexpression data. Samples were first compared with LZTR1 SNVs againstthose with wild-type LZTR1 (excluding LZTR1 deletions). To assessstatistical significance, the data set was randomized by permuting genesets 500 times and considered only gene sets with an FDR q<0.05.

Differential expression between samples with EGFR-SEPT14 and EGFRvIII.In-house differential expression analysis was also performed todetermine a distinct molecular signature distinguishing the EGFR-SEPT14and EGFRvIII phenotypes. Toward this end, a t test was performedcomparing the expression of the two groups of samples for each gene.Correcting using the Benjamini-Hochberg method, only genes with FDR<0.05were considered. In addition, genes were excluded with a variance lessthan the tenth percentile or absolute value lower than two across allsamples. These filters left a predictive set of ten genes. Hierarchicalclustering was then performed on the expression of these ten genes usingEuclidean distance and average linkage.

Modeling of LZTR1. Structural templates for the kelch and BTB-BACKregions of human LZTR1 were identified with HHpred⁵⁶. An initialthree-dimensional model was generated with the I-TASSER servers⁵⁷. TheCUL3 N-terminal domain was docked onto the model by superposing theKLHL3^(BTB-BACK)/CUL3^(NTD) crystal structure²⁷ onto the second LZTR1BTB-BACK domain. The model does not include higher quaternary structure,although many BTB domains, and many kelch domains, are known to selfassociate²⁵. The short linkage between the end of the first BACK domainand the beginning of the second BTB domain would seem to preclude anintrachain BTB-BTB pseudo-homodimer, and without being bond by theory,LZTR1 should self associate and form higher-order assemblies. Both BACKdomains are the shorter, atypical form of the domain and consist of twohelical hairpin motifs, as in SPOP^(26, 58), and not the four-hairpinmotif seen in most BTB-BACK-kelch proteins^(28, 58). The model from thekelch domain predicts an unusual 1+3 velcro arrangement⁵⁹, with theN-terminal region contributing strand d of blade 1 and the C-terminalregion contributing strands a, b and c of the same blade, although analternative 2+2 velcro model cannot be ruled out.

Cell Culture. U87 cells were obtained from ATCC. SNB19, U87 and HEK-293Tcells were cultured in DMEM supplemented with 10% fetal bovine serum(FBS). Growth rates were determined by plating cells in six-well platesat 3 d after infection with the lentivirus indicated in the figurelegends. The number of viable cells was determined by Trypan blueexclusion in triplicate cultures obtained from triplicate independentinfections. For the wound assay testing migration, confluent cells werescratched with a pipette tip and cultured in 0.25% FBS. After 16 h,images were taken using the Olympus IX70 connected to a digital camera.Images were processed using the ImageJ64 software. The area of thecell-free wound was assessed in triplicate samples. Experiments wererepeated twice.

GBM-derived primary cultures were grown in DMEM:F12 medium containing N2and B27 supplements and human recombinant FGF-2 and EGF (50 ng/ml each;Peprotech). For sphere formation, cells were infected with lentiviralparticles. Four days later, single cells were plated at density of 1cells per well in triplicate in low-attachment 96-well plates. Thenumber and the size of spheres were scored after 10-14 d. Limitingdilution assays were performed as described previously⁶⁰. Spheres weredissociated into single cells and plated in low-attachment 96-wellplates in 0.2 ml of medium containing growth factors (EGF and FGF-2),except for the EGFR-transduced cells, which were cultured in the absenceof EGF. Cultures were left undisturbed for 10 d, and then the percentageof wells not containing spheres for each cell dilution was calculatedand plotted against the number of cells per well. Linear regressionlines were plotted, and the number of cells required to generate atleast one sphere in every well (the stem cell frequency) was calculated.The experiment was repeated twice. Treatment of GBM primary cultureswith erlotinib or lapatinib was performed in cells transduced with thepLOC vector, wild-type pLOC-EGFR, EGFRvIII or EGFR-SEPT14 and selectedwith blasticidin for 5 d. Cells were seeded on 6-cm dishes in theabsence of EGF and treated with the indicated drugs at the indicateddoses for 48 h. Each treatment group was seeded in triplicate. Absoluteviable cell counts were determined by Trypan blue exclusion and countedon a hemocytometer. EGF stimulation of EGFR-transduced primary gliomacells was performed in cells deprived of growth factors for 48 h. Cellswere collected at the indicated times and processed for protein blotanalysis.

Immunofluorescence. Immunofluorescence staining on normal mouse andhuman brain and brain tumor tissue microarrays were performed aspreviously described^(43, 61, 62). Immunofluorescence microscopy wasperformed on cells fixed with 4% paraformaldehyde in phosphate buffer.Cells were permeabilized using 0.2% Triton X-100. The antibodies andconcentrations used in the immunofluorescence staining are detailed inFIG. 34 .

Secondary antibodies conjugated to Alexa Fluor 594 (1:300, A11037,Molecular Probes) or Alexa 488 (1:500, A11008, Molecular Probes) wereused. DNA was stained with DAPI (Sigma). Fluorescence microscopy wasperformed on a Nikon A1R MP microscope. Quantification of thefluorescence intensity staining in primary or established glioma cellswas performed using NIH ImageJ software (see URLs). A histogram of theintensity of fluorescence of each point of a representative field foreach condition was generated. The fluorescence intensity of ten fieldsfrom three independent experiments was scored, standardized to thenumber of cells in the field and divided by the intensity of the vector.

Protein blotting, immunoprecipitation and in vitro binding. Protein blotanalysis and immunoprecipitation were performed using the antibodiesdetailed in FIG. 35 . For the in vitro binding between CUL3 and LZTR1,wild-type and mutant LZTR1 were translated in vitro using the TNT QuickCoupled Transcription/Translation System (Promega). Flag-CUL3 wasimmunoprecipitated from transfected HEK-293T cells with Flag-M2 beads(Sigma) using RIPA buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1%NP-40, 0.5% sodium deoxycholate (DOC), 0.1% SDS, 1 mMphenylmethylsulfonyl fluoride (PMSF), 10 mM NaF, 0.5 M Na₃OV₄ (sodiumorthovanadate) and Complete Protease Inhibitor Cocktail, Roche). Bindingwas performed in 200 mM NaCl plus 0.5% NP-40 for 2 h at 4° C.Immunocomplexes were analyzed by SDS-PAGE and immunoblot.

Cloning and Lentiviral Production.

The lentiviral expression vectors pLOC-GFP and pLOC-CTNND2 werepurchased from Open Biosystems. Full-length EGFR-SEPT14 cDNA wasamplified from tumor sample TCGA-27-1837. Wild-type EGFR, EGFRvIII andEGFR-SEPT14 cDNAs were cloned into the pLOC vector. pCDNA-MYC-Hist-LZTR1was a kind gift²⁴. pCDNA-Flag-CUL3 was a gift. Wild-type and mutantcDNAs for LZTR1 and CTNND2 obtained by site-directed mutagenesis(QuikChange II, Agilent) were cloned into the pLOC vector. Lentiviralparticles were produced using published protocols^(43, 61, 62, 63, 64).

Genomic PCR and RT-PCR. Total RNA was extracted from cells using anRNeasy Mini Kit (QIAGEN) following the manufacturer's instructions.Five-hundred nanograms of total RNA was retrotranscribed using theSuperscript III kit (Invitrogen) following the manufacturer'sinstructions. The cDNAs obtained after the retrotranscription were usedas templates for quantitative PCR as described^(43, 64). The reactionwas performed with a Roche480 thermal cycler using the Absolute BlueQPCR SYBR Green Mix from Thermo Scientific. The relative amount ofspecific mRNA was normalized to GAPDH. Results are presented as themean±s.d. of triplicate amplifications. The validation of fusiontranscripts was performed using both genomic PCR and RT-PCR with forwardand reverse primer combinations designed within the margins of thepaired-end read sequences detected by RNA-seq. Expressed fusiontranscript variants were subjected to direct sequencing to confirm thesequence and translation frame. The primers used for the screening ofgene fusions are detailed in FIG. 36 . The primers used for genomicdetection of gene fusions are listed in FIG. 37 . SemiquantitativeRT-PCR to detect exogenous wild-type MYC-LZTR1 and mutant p.Arg801TrpLZTR1 was performed using the primers listed in FIG. 38 .

Subcutaneous xenografts and drug treatment. Female athymic mice (nu/nugenotype, BALB/c background, 6-8 weeks old) were used for all antitumorstudies. Patient-derived adult human glioblastoma xenografts weremaintained. Xenografts were excised from host mice under sterileconditions and homogenized with the use of a tissue press and modifiedtissue cytosieve (Biowhitter Inc.), and tumor homogenate was loaded intoa repeating Hamilton syringe (Hamilton, Co.) dispenser. Cells wereinjected subcutaneously into the right flank of the athymic mouse at aninoculation volume of 50 ml with a 19-gauge needle⁶⁵.

Subcutaneous tumors were measured twice weekly with hand-held verniercalipers (Scientific Products). Tumor volumes (V) were calculated withthe following formula: ((width)²×(length))/2=V (mm³). For thesubcutaneous tumor studies, groups of mice randomly selected by tumorvolume were treated with EGFR kinase inhibitors when the median tumorvolumes were an average of 150 mm³ and were compared with controlanimals receiving vehicle (saline).

Erlotinib was administered at 100 mg per kg body weight orally once perday for 10 d. Lapatinib was administered at 75 mg per kg body weightorally twice per day for 20 d. Response to treatment was assessed by adelay in tumor growth and tumor regression.

Growth delay, expressed as a T-C value, is defined as the difference indays between the median time required for tumors in treated and controlanimals to reach a volume five times greater than that measured at thestart of the treatment. Tumor regression is defined as a decrease intumor volume over two successive measurements. Statistical analysis wasperformed using a SAS statistical analysis program, the Wilcoxonrank-order test for growth delay and Fisher's exact test for tumorregression.

Intracranial injection. GBM-derived primary cells were first infectedwith a lentivirus expressing luciferase and subsequently transduced withthe pLOC vector or pLOC-CTNND2 lentiviral particles. Intracranialinjection was performed in 9-week-old male nu/nu mice (Charles RiverLaboratories). Briefly, 5≤10⁵ cells were resuspended in 2.5 μl of PBSand injected into the caudate putamen using a stereotaxic frame(coordinates relative to the bregma: 0.6 mm anterior; 1.65 mmmedium-lateral; 3 mm depth-ventral). Tumor growth was monitored usingthe IVIS Imaging system. Briefly, mice were anesthetized with 3%isoflurane before intraperitoneal injection of 100 mg per kg body weightn-luciferin (Xenogen). Ten minutes after injection of n-luciferin,images were acquired for 1 min with the Xenogen IVIS system (Xenogen)using Living Image acquisition and analysis software (Xenogen). Thebioluminescent signal was expressed in photons per second and displayedas a pseudo-color image representing the spatial distribution of photoncounts.

URLs. DNA and RNA sequencing and copy number variant data in The CancerGenome Atlas (TCGA), http://cancergenome.nih.gov; glioma patientsurvival data from the Repository for Molecular Brain Neoplasia Data(REMBRANDT), https://caintegrator.nci.nih.gov/rembrandt/; sequence datadeposition in database of Genotypes and Phenotypes (dbGaP),http://www.ncbi.nlm.nih.gov/gap; gene fusion annotation software packagePegasus, http://sourceforge.net/projects/pegasus-fus/.

Data access. RNA sequencing of twenty-four human GBM sphere cultures inthis study were deposited under the dbGaP study accessionphs000505.v2.pl. RNA and DNA sequencing of TCGA GBM samples was alsoanalyzed from the dbGaP study accession phs000178.vl.pl.

REFERENCES FOR EXAMPLE 2

-   1 Porter, K. R., McCarthy, B. J., Freels, S., Kim, Y. & Davis, F. G.    Prevalence estimates for primary brain tumors in the United States    by age, gender, behavior, and histology. Neuro-oncology 12, 520-527,    doi:10.1093/neuonc/nop066 (2010).-   2 Stupp, R. et al. Radiotherapy plus concomitant and adjuvant    temozolomide for glioblastoma. The New England journal of medicine    352, 987-996, doi:10.1056/NEJMoa043330 (2005).-   3 Cancer Genome Atlas Research, N. Comprehensive genomic    characterization defines human glioblastoma genes and core pathways.    Nature 455, 1061-1068, doi:10.1038/nature07385 (2008).-   4 Noushmehr, H. et al. Identification of a CpG island methylator    phenotype that defines a distinct subgroup of glioma. Cancer Cell    17, 510-522, doi:10.1016/j.ccr.2010.03.017 (2010).-   5 Parsons, D. W. et al. An integrated genomic analysis of human    glioblastoma multiforme. Science 321, 1807-1812,    doi:10.1126/science.1164382 (2008).-   6 Verhaak, R. G. et al. Integrated genomic analysis identifies    clinically relevant subtypes of glioblastoma characterized by    abnormalities in PDGFRA, IDH1, EGFR, and NFL Cancer Cell 17, 98-110,    doi:10.1016/j.ccr.2009.12.020 (2010).-   7 Bass, A. J. et al. Genomic sequencing of colorectal    adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat    Genet 43, 964-968, doi:10.1038/ng.936 (2011).-   8 Chinnaiyan, A. M. & Palanisamy, N. Chromosomal aberrations in    solid tumors. Prog Mol Biol Transl Sci 95, 55-94,    doi:10.1016/B978-0-12-385071-3.00004-6 (2010).-   9 Singh, D. et al. Transforming fusions of FGFR and TACC genes in    human glioblastoma. Science 337, 1231-1235,    doi:10.1126/science.1220834 (2012).-   10 Rubin, A. F. & Green, P. Mutation patterns in cancer genomes.    Proc Natl Acad Sci USA 106, 21766-21770, doi:10.1073/pnas.0912499106    (2009).-   11 Fan, Z. et al. BCOR regulates mesenchymal stem cell function by    epigenetic mechanisms. Nat Cell Biol 11, 1002-1009,    doi:10.1038/ncb1913 (2009).-   12 Wamstad, J. A. & Bardwell, V. J. Characterization of Bcor    expression in mouse development. Gene Expr Patterns 7, 550-557,    doi:10.1016/j.modgep.2007.01.006 (2007).-   13 Wamstad, J. A., Corcoran, C. M., Keating, A. M. & Bardwell, V. J.    Role of the transcriptional corepressor Bcor in embryonic stem cell    differentiation and early embryonic development. PLoS One 3, e2814,    doi:10.1371/journal.pone.0002814 (2008).-   14 Pugh, T. J. et al. Medulloblastoma exome sequencing uncovers    subtype-specific somatic mutations. Nature 488, 106-110,    doi:10.1038/nature11329 (2012).-   15 Zhang, J. et al. A novel retinoblastoma therapy from genomic and    epigenetic analyses. Nature 481, 329-334, doi:10.1038/nature10733    (2012).-   16 Beroukhim, R. et al. The landscape of somatic copy-number    alteration across human cancers. Nature 463, 899-905,    doi:10.1038/nature08822 (2010).-   17 Kantarci, S. et al. Mutations in LRP2, which encodes the    multiligand receptor megalin, cause Donnai-Barrow and    facio-oculo-acoustico-renal syndromes. Nat Genet 39, 957-959,    doi:10.1038/ng2063 (2007).-   18 Willnow, T. E. et al. Defective forebrain development in mice    lacking gp330/megalin. Proc Natl Acad Sci USA 93, 8460-8464 (1996).-   19 Christ, A. et al. LRP2 is an auxiliary SHH receptor required to    condition the forebrain ventral midline for inductive signals. Dev    Cell 22, 268-278, doi:10.1016/j.devce1.2011.11.023 (2012).-   20 Cowin, P. A. et al. LRP1B deletion in high-grade serous ovarian    cancers is associated with acquired chemotherapy resistance to    liposomal doxorubicin. Cancer Res 72, 4060-4073,    doi:10.1158/0008-5472.CAN-12-0203 (2012).-   21 Lima, F. R. et al. Glioblastoma: therapeutic challenges, what    lies ahead. Biochim Biophys Acta 1826, 338-349, doi:    10.1016/j.bbcan.2012.05.004 (2012).-   22 Bekker-Jensen, S. et al. HERC2 coordinates ubiquitin-dependent    assembly of DNA repair factors on damaged chromosomes. Nat Cell Biol    12, 80-86; sup pp 81-12, doi:10.1038/ncb2008 (2010).-   23 Harlalka, G. V. et al. Mutation of HERC2 causes developmental    delay with Angelman-like features. J Med Genet 50, 65-73,    doi:10.1136/jmedgenet-2012-101367 (2013).-   24 Nacak, T. G., Leptien, K., Fellner, D., Augustin, H. G. &    Kroll, J. The BTB-kelch protein LZTR-1 is a novel Golgi protein that    is degraded upon induction of apoptosis. J Biol Chem 281, 5065-5071,    doi:10.1074/jbc.M509073200 (2006).-   25 Stogios, P. J., Downs, G. S., Jauhal, J. J., Nandra, S. K. &    Prive, G. G. Sequence and structural analysis of BTB domain    proteins. Genome Biol 6, R82, doi:10.1186/gb-2005-6-10-r82 (2005).-   26 Errington, W. J. et al. Adaptor protein self-assembly drives the    control of a cullin-RING ubiquitin ligase. Structure 20, 1141-1153,    doi:10.1016/j.str.2012.04.009 (2012).-   27 Ji, A. X. & Prive, G. G. Crystal structure of KLHL3 in complex    with Cullin3. PLoS One 8, e60445, doi:10.1371/journal.pone.0060445    (2013).-   28 Canning, P. et al. Structural basis for Cul3 assembly with the    BTB-Kelch family of E3 ubiquitin ligases. J Biol Chem,    doi:10.1074/jbc.M112.437996 (2013).-   29 Lo, S. C., Li, X., Henzl, M. T., Beamer, L. J. & Hannink, M.    Structure of the Keap1:Nrf2 interface provides mechanistic insight    into Nrf2 signaling. EMBO J 25, 3605-3617,    doi:10.1038/sj.emboj.7601243 (2006).-   30 Boyden, L. M. et al. Mutations in kelch-like 3 and cullin 3 cause    hypertension and electrolyte abnormalities. Nature 482, 98-102,    doi:10.1038/nature10814 (2012).-   31 Louis-Dit-Picard, H. et al. KLHL3 mutations cause familial    hyperkalemic hypertension by impairing ion transport in the distal    nephron. Nat Genet 44, 456-460, S451-453, doi:10.1038/ng.2218    (2012).-   32 Emanuele, M. J. et al. Global identification of modular    cullin-RING ligase substrates. Cell 147, 459-474,    doi:10.1016/j.cell.2011.09.019 (2011).-   33 Galan, J. M. & Peter, M. Ubiquitin-dependent degradation of    multiple F-box proteins by an autocatalytic mechanism. Proc Natl    Acad Sci USA 96, 9124-9129 (1999).-   34 Zhang, D. D. et al. Ubiquitination of Keap1, a BTB-Kelch    substrate adaptor protein for Cul3, targets Keap1 for degradation by    a proteasome-independent pathway. J Biol Chem 280, 30091-30099,    doi:10.1074/jbc.M501279200 (2005).-   35 Gunther, H. S. et al. Glioblastoma-derived stem cell-enriched    cultures form distinct subgroups according to molecular and    phenotypic criteria. Oncogene 27, 2897-2909,    doi:10.1038/sj.onc.1210949 (2008).-   36 Abu-Elneel, K. et al. A delta-catenin signaling pathway leading    to dendritic protrusions. J Biol Chem 283, 32781-32791,    doi:10.1074/jbc.M804688200 (2008).-   37 Arikkath, J. et al. Delta-catenin regulates spine and synapse    morphogenesis and function in hippocampal neurons during    development. J Neurosci 29, 5435-5442,    doi:10.1523/JNEUROSCI.0835-09.2009 (2009).-   38 Kosik, K. S., Donahue, C. P., Israely, I., Liu, X. & Ochiishi, T.    Delta-catenin at the synaptic-adherens junction. Trends Cell Biol    15, 172-178, doi:10.1016/j.tcb.2005.01.004 (2005).-   39 Israely, I. et al. Deletion of the neuron-specific protein    delta-catenin leads to severe cognitive and synaptic dysfunction.    Curr Biol 14, 1657-1663, doi:10.1016/j.cub.2004.08.065 (2004).-   40 Jun, G. et al. delta-Catenin is genetically and biologically    associated with cortical cataract and future Alzheimer-related    structural and functional brain changes. PLoS One 7, e43728,    doi:10.1371/journal.pone.0043728 (2012).-   41 Hicks, S., Wheeler, D. A., Plon, S. E. & Kimmel, M. Prediction of    missense mutation functionality depends on both the algorithm and    sequence alignment employed. Hum Mutat 32, 661-668,    doi:10.1002/humu.21490 (2011).-   42 Phillips, H. S. et al. Molecular subclasses of high-grade glioma    predict prognosis, delineate a pattern of disease progression, and    resemble stages in neurogenesis. Cancer Cell 9, 157-173,    doi:10.1016/j.ccr.2006.02.019 (2006).-   43 Carro, M. S. et al. The transcriptional network for mesenchymal    transformation of brain tumours. Nature 463, 318-325,    doi:10.1038/nature08712 (2010).-   44 Pierotti, M. A. & Greco, A. Oncogenic rearrangements of the    NTRK1/NGF receptor. Cancer Lett 232, 90-98,    doi:10.1016/j.canlet.2005.07.043 (2006).-   45 Dunn, G. P. et al. Emerging insights into the molecular and    cellular basis of glioblastoma. Genes Dev 26, 756-784,    doi:10.1101/gad.187922.112 (2012).-   46 Liu, C. et al. Chemokine receptor CXCR3 promotes growth of    glioma. Carcinogenesis 32, 129-137, doi:10.1093/carcin/bgq224    (2011).-   47 Vivanco, I. et al. Differential sensitivity of glioma- versus    lung cancer-specific EGFR mutations to EGFR kinase inhibitors.    Cancer Discov 2, 458-471, doi:10.1158/2159-8290.CD-11-0284 (2012).-   48 Forbes, S. A. et al. COSMIC (the Catalogue of Somatic Mutations    in Cancer): a resource to investigate acquired mutations in human    cancer. Nucleic Acids Res 38, D652-657, doi:10.1093/nar/gkp995    (2010).-   49 Northcott, P. A. et al. Subgroup-specific structural variation    across 1,000 medulloblastoma genomes. Nature 488, 49-56,    doi:10.1038/nature11327 (2012).-   Srivastava, M. et al. The Amphimedon queenslandica genome and the    evolution of animal complexity. Nature 466, 720-726 (2010).-   Stogios et al. Sequence and structural analysis of BTB domain    proteins. Genome Biol. 6(10):R82 (2005).-   Soding, J. Protein homology detection by HMM-HMM comparison.    Bioinformatics. 21(7):951-60 (2005).

ANNOTATIONS TO FIGURES

Annotation information in each column is described below for FIG. 27 :

-   -   sample: Name of TCGA or private sample.    -   #chrom5p: 5′ chromosome    -   #start5p: 5′ genomic start coordinate    -   #end5p: 5′ genomic end coordinate    -   #chrom3p: 3′ chromosome    -   #start3p: 3′ genomic start coordinate    -   #end3p: 3′ genomic end coordinate    -   strand5p: 5′ strand    -   strand3p: 3′ strand    -   genes5p: 5′ gene    -   genes3p: 3′ gene    -   total_frags (split inserts+split reads): Total number of split        inserts and split reads    -   spanning_frags (split reads): Number of split reads    -   GeneBreakpoint5p: The genomic coordinate of the breakpoint in        the 5′ gene    -   GeneBreakpoint3p: The genomic coordinate of the breakpoint in        the 3′ gene    -   FrameType: Reading frame of gene fusions. Values include        in-frame, frameshift, or null (no transcript information was        found in the Ensembl Homo_sapiens.GRCh37.60.gtf file).    -   FusedSequence: Reconstructed sequence of the fusion RNA        transcript    -   ProteinStart5p: The start coordinate of the 5′ protein segment    -   ProteinStop5p: The stop coordinate (breakpoint) of the 5′        protein segment    -   ProteinStart3p: The start coordinate (breakpoint) of the 3′        protein segment    -   ProteinStop3p: The stop coordinate of the 3′ protein segment    -   ProteinSequence: Reconstructed sequence of the fusion protein    -   ExonBreak5p: The last exon of the 5′ gene before the breakpoint    -   ExonBreak3p: The first exon of the 3′ gene after the breakpoint

Annotation information in each column is described below for FIG. 28 :

-   -   sample: Name of TCGA sample    -   split reads: Total number of split reads    -   gene5p: 5′ gene    -   chr5p: 5′ chromosome    -   sense5p: 5′ sense    -   start5p: 5′ genomic start coordinate    -   end5p: 5′ genomic end coordinate    -   breakpoint5p: 5′ genomic coordinate of breakpoint    -   exonBeforeBreakpoint5p: Exon number of 5′ gene before the        breakpoint    -   gene3p: 3′ gene    -   chr3p: 3′ chromosome    -   sense3p: 3′ sense    -   start3p: 3′ genomic start coordinate    -   end3p: 3′ genomic end coordinate    -   breakpoint3p: 3′ genomic coordinate of breakpoint    -   exonAfterBreakpoint3p: Exon number of 3′ gene after the        breakpoint    -   split inserts: Total number of split inserts    -   posA5p: Coordinate of split insert read closest to 5′ end in 5′        gene    -   posB5p: Coordinate of split insert read closest to 3′ end in 5′        gene    -   readDir5p: Read direction of split insert reads in 5′ gene    -   posA3p: Coordinate of split insert read closest to 5′ end in 3′        gene    -   posB3p: Coordinate of split insert read closest to 3′ end in 3′        gene    -   readDir3p: Read direction of split insert reads in 3′ gene

Annotation information in each column is described below for FIG. 31 :

-   -   sample: Name of TCGA sample    -   split reads: Total number of split reads    -   gene5p: 5′ gene    -   chr5p: 5′ chromosome    -   sense5p: 5′ sense    -   start5p: 5′ genomic start coordinate    -   end5p: 5′ genomic end coordinate    -   breakpoint5p: 5′ genomic coordinate of breakpoint    -   exonBeforeBreakpoint5p: Exon number of 5′ gene before the        breakpoint    -   gene3p: 3′ gene    -   chr3p: 3′ chromosome    -   sense3p: 3′ sense    -   start3p: 3′ genomic start coordinate    -   end3p: 3′ genomic end coordinate    -   breakpoint3p: 3′ genomic coordinate of breakpoint    -   exonAfterBreakpoint3p: Exon number of 3′ gene after the        breakpoint    -   split inserts: Total number of split inserts    -   posA5p: Coordinate of split insert read closest to 5′ end in 5′        gene    -   posB5p: Coordinate of split insert read closest to 3′ end in 5′        gene    -   readDir5p: Read direction of split insert reads in 5′ gene    -   posA3p: Coordinate of split insert read closest to 5′ end in 3′        gene    -   posB3p: Coordinate of split insert read closest to 3′ end in 3′        gene    -   readDir3p: Read direction of split insert reads in 3′ gene

What is claimed is:
 1. A cDNA encoding a fusion protein comprising thetyrosine kinase domain of EGFR fused to: (i) a phosphoserine phosphatase(PSPH) protein; or (ii) a Cullin-associated and neddylation-dissociated(CAND) protein.
 2. The cDNA of claim 1, wherein the CAND protein is CANDprotein is CAND1, CAND2, or CAND3.
 3. The cDNA of claim 1, wherein theCAND protein is CAND1.
 4. The cDNA of claim 1, wherein the fusionprotein is EGFR-CAND1 or EGFR-PSPH.
 5. The cDNA of claim 1, wherein thecDNA comprising the tyrosine kinase domain of EGFR fused to a PSPHprotein comprises SEQ ID NO: 8 or SEQ ID NO: 10 or has a genomicbreakpoint comprising SEQ ID NO:
 10. 6. The cDNA of claim 1, wherein thecDNA comprising the tyrosine kinase domain of EGFR fused to a CANDprotein comprises SEQ ID NO: 14 or SEQ ID NO: 15, or has a genomicbreakpoint comprising SEQ ID NO:
 15. 7. A purified fusion proteincomprising the tyrosine kinase domain of EGFR fused to: (i) aphosphoserine phosphatase (PSPH) protein; or (ii) a Cullin-associatedand neddylation-dissociated (CAND) protein.
 8. The purified fusionprotein of claim 7, wherein the fusion protein is EGFR-CAND1 orEGFR-PSPH.
 9. The purified fusion protein of claim 7, wherein the fusionprotein comprises SEQ ID NO: 7, 11, 13, or
 8495. 10. A method ofdecreasing growth of a solid tumor in a subject in need thereof, themethod comprising administering to the subject an effective amount of anEGFR fusion molecule inhibitor, wherein the inhibitor decreases the sizeof the solid tumor, and wherein the EGFR fusion comprises the tyrosinekinase domain of EGFR fused to: (i) the coiled-coil domain of a Septinprotein; (ii) a phosphoserine phosphatase (PSPH) protein; or (iii) aCullin-associated and neddylation-dissociated (CAND) protein.
 11. Themethod of claim 10, wherein the solid tumor comprises glioblastomamultiforme, breast cancer, lung cancer, prostate cancer, or colorectalcarcinoma.
 12. The method of claim 11, wherein the solid tumor comprisesglioblastoma multiforme.
 13. The method of claim 10, wherein theinhibitor comprises an antibody that specifically binds to an EGFR-SEPTfusion protein, an EGFR-PSPH fusion protein, an EGFR-CAND fusionprotein, or a fragment thereof.
 14. The method of claim 10, wherein theinhibitor comprises a small molecule that specifically binds to an EGFRprotein.
 15. The method of claim 10, wherein the inhibitor comprises anantisense RNA or antisense DNA that decreases expression of an EGFR-SEPTfusion protein, an EGFR-PSPH fusion protein, or an EGFR-CAND fusionprotein.
 16. The method of claim 10, wherein the inhibitor comprises asiRNA that specifically targets an EGFR-SEPT fusion gene, an EGFR-PSPHfusion gene, or an EGFR-CAND fusion gene; or a combination thereof. 17.The method of claim 10, wherein the fusion protein is EGFR-SEPT14,EGFR-CAND1, or EGFR-PSPH.
 18. The method of claim 14, wherein the smallmolecule that specifically binds to an EGFR protein comprises AZD4547,NVP-BGJ398, PD173074, NF449, TK1258, BIBF-1120, BMS-582664, AZD-2171,TSU68, AB1010, AP24534, E-7080, LY2874455, or a combination thereof.